Average Nucleotide Identity (ANI) is crucial to the delineation of prokaryotic species and in metagenomic workflows. BLAST-based methods, like OrthoANI [7] are highly accurate but slow. Alignment-free methods like fastANI [5] and skani [8] trade some accuracy for speed increase and scalability to the ever-increasing genome databases. In light of the performance improvement provided by skani we devel- oped a Python-based benchmarking framework that compares fastANI and skani and uses PyOrthoANI [6] as an accuracy reference, to facilitate integration into Python scripts. We generate in silico mutation series from a predefined array for each reference genome across multiple regimes: uniformly random SNPs, position- structured SNPs, short indels, and horizontal gene transfer (HGT) replacements. For each reference–query pair, record ANI with coverage proxies (fastANI aligned fragment counts; skani alignment fractions). Across SNP gradients, both tools show the expected monotonic ANI decline. Under gene content changes (indels, HGT), both FastANIand skanitypically report near maximal ANI on aligned regions while alignment fractions drop. We also observe unexpected runtime behavior at higher mutation rates when evaluating the computation speed across the mutation rate ar- ray, while the runtime for increasing pairwise computations increases as expected.
-Average Nucleotide Identity (ANI) is crucial to the delineation of prokaryotic species and in metagenomic workflows. BLAST-based methods, like OrthoANI [7] are highly accurate but slow. Alignment-free methods like fastANI [5] and skani [8] trade some accuracy for speed increase and scalability to the ever-increasing genome databases. In light of the performance improvement provided by skani we devel- oped a Python-based benchmarking framework that compares fastANI and skani and uses PyOrthoANI [6] as an accuracy reference, to facilitate integration into Python scripts. We generate in silico mutation series from a predefined array for each reference genome across multiple regimes: uniformly random SNPs, position- structured SNPs, short indels, and horizontal gene transfer (HGT) replacements. For each reference–query pair, record ANI with coverage proxies (fastANI aligned fragment counts; skani alignment fractions). Across SNP gradients, both tools show the expected monotonic ANI decline. Under gene content changes (indels, HGT), both FastANIand skanitypically report near maximal ANI on aligned regions while alignment fractions drop. We also observe unexpected runtime behavior at higher mutation rates when evaluating the computation speed across the mutation rate ar- ray, while the runtime for increasing pairwise computations increases as expected.
Benchmarking ANI Calculation Software: A Python-Based Comparison of FastANI and skani
BARLETTA, LORENZO
2024/2025
Abstract
Average Nucleotide Identity (ANI) is crucial to the delineation of prokaryotic species and in metagenomic workflows. BLAST-based methods, like OrthoANI [7] are highly accurate but slow. Alignment-free methods like fastANI [5] and skani [8] trade some accuracy for speed increase and scalability to the ever-increasing genome databases. In light of the performance improvement provided by skani we devel- oped a Python-based benchmarking framework that compares fastANI and skani and uses PyOrthoANI [6] as an accuracy reference, to facilitate integration into Python scripts. We generate in silico mutation series from a predefined array for each reference genome across multiple regimes: uniformly random SNPs, position- structured SNPs, short indels, and horizontal gene transfer (HGT) replacements. For each reference–query pair, record ANI with coverage proxies (fastANI aligned fragment counts; skani alignment fractions). Across SNP gradients, both tools show the expected monotonic ANI decline. Under gene content changes (indels, HGT), both FastANIand skanitypically report near maximal ANI on aligned regions while alignment fractions drop. We also observe unexpected runtime behavior at higher mutation rates when evaluating the computation speed across the mutation rate ar- ray, while the runtime for increasing pairwise computations increases as expected.| File | Dimensione | Formato | |
|---|---|---|---|
|
Barletta_Lorenzo.pdf
accesso aperto
Dimensione
1.15 MB
Formato
Adobe PDF
|
1.15 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/91932