Benchmarking ANI Calculation Software: A Python-Based Comparison of FastANI and skani

Average Nucleotide Identity (ANI) is crucial to the delineation of prokaryotic species and in metagenomic workflows. BLAST-based methods, like OrthoANI [7] are highly accurate but slow. Alignment-free methods like fastANI [5] and skani [8] trade some accuracy for speed increase and scalability to the ever-increasing genome databases. In light of the performance improvement provided by skani we devel- oped a Python-based benchmarking framework that compares fastANI and skani and uses PyOrthoANI [6] as an accuracy reference, to facilitate integration into Python scripts. We generate in silico mutation series from a predefined array for each reference genome across multiple regimes: uniformly random SNPs, position- structured SNPs, short indels, and horizontal gene transfer (HGT) replacements. For each reference–query pair, record ANI with coverage proxies (fastANI aligned fragment counts; skani alignment fractions). Across SNP gradients, both tools show the expected monotonic ANI decline. Under gene content changes (indels, HGT), both FastANIand skanitypically report near maximal ANI on aligned regions while alignment fractions drop. We also observe unexpected runtime behavior at higher mutation rates when evaluating the computation speed across the mutation rate ar- ray, while the runtime for increasing pairwise computations increases as expected.

-Average Nucleotide Identity (ANI) is crucial to the delineation of prokaryotic species and in metagenomic workflows. BLAST-based methods, like OrthoANI [7] are highly accurate but slow. Alignment-free methods like fastANI [5] and skani [8] trade some accuracy for speed increase and scalability to the ever-increasing genome databases. In light of the performance improvement provided by skani we devel- oped a Python-based benchmarking framework that compares fastANI and skani and uses PyOrthoANI [6] as an accuracy reference, to facilitate integration into Python scripts. We generate in silico mutation series from a predefined array for each reference genome across multiple regimes: uniformly random SNPs, position- structured SNPs, short indels, and horizontal gene transfer (HGT) replacements. For each reference–query pair, record ANI with coverage proxies (fastANI aligned fragment counts; skani alignment fractions). Across SNP gradients, both tools show the expected monotonic ANI decline. Under gene content changes (indels, HGT), both FastANIand skanitypically report near maximal ANI on aligned regions while alignment fractions drop. We also observe unexpected runtime behavior at higher mutation rates when evaluating the computation speed across the mutation rate ar- ray, while the runtime for increasing pairwise computations increases as expected.

Benchmarking ANI Calculation Software: A Python-Based Comparison of FastANI and skani

BARLETTA, LORENZO

2024/2025

Abstract

Average Nucleotide Identity (ANI) is crucial to the delineation of prokaryotic species and in metagenomic workflows. BLAST-based methods, like OrthoANI [7] are highly accurate but slow. Alignment-free methods like fastANI [5] and skani [8] trade some accuracy for speed increase and scalability to the ever-increasing genome databases. In light of the performance improvement provided by skani we devel- oped a Python-based benchmarking framework that compares fastANI and skani and uses PyOrthoANI [6] as an accuracy reference, to facilitate integration into Python scripts. We generate in silico mutation series from a predefined array for each reference genome across multiple regimes: uniformly random SNPs, position- structured SNPs, short indels, and horizontal gene transfer (HGT) replacements. For each reference–query pair, record ANI with coverage proxies (fastANI aligned fragment counts; skani alignment fractions). Across SNP gradients, both tools show the expected monotonic ANI decline. Under gene content changes (indels, HGT), both FastANIand skanitypically report near maximal ANI on aligned regions while alignment fractions drop. We also observe unexpected runtime behavior at higher mutation rates when evaluating the computation speed across the mutation rate ar- ray, while the runtime for increasing pairwise computations increases as expected.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Biologia - DiBio
			
	Corso di studio
	
				BIOLOGIA MOLECOLARE Laurea di Primo Livello (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Benchmarking ANI Calculation Software: A Python-Based Comparison of FastANI and skani
			
	Abstract in italiano
	
				-Average Nucleotide Identity (ANI) is crucial to the delineation of prokaryotic
species and in metagenomic workflows. BLAST-based methods, like OrthoANI [7]
are highly accurate but slow. Alignment-free methods like fastANI [5] and skani [8]
trade some accuracy for speed increase and scalability to the ever-increasing genome
databases. In light of the performance improvement provided by skani we devel-
oped a Python-based benchmarking framework that compares fastANI and skani
and uses PyOrthoANI [6] as an accuracy reference, to facilitate integration into
Python scripts. We generate in silico mutation series from a predefined array for
each reference genome across multiple regimes: uniformly random SNPs, position-
structured SNPs, short indels, and horizontal gene transfer (HGT) replacements.
For each reference–query pair, record ANI with coverage proxies (fastANI aligned
fragment counts; skani alignment fractions). Across SNP gradients, both tools show
the expected monotonic ANI decline. Under gene content changes (indels, HGT),
both FastANIand skanitypically report near maximal ANI on aligned regions while
alignment fractions drop. We also observe unexpected runtime behavior at higher
mutation rates when evaluating the computation speed across the mutation rate ar-
ray, while the runtime for increasing pairwise computations increases as expected.
			
	Parola chiave
	
				Computational
Metagenomics
Python
			
	Relatore
	
				CAMPANARO, STEFANO
			
	Appare nelle tipologie:
	
				Lauree triennali

File in questo prodotto:

File	Dimensione	Formato
Barletta_Lorenzo.pdf accesso aperto Dimensione 1.15 MB Formato Adobe PDF Visualizza/Apri	1.15 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/91932