The thesis focuses on evaluating and benchmarking various tools used in the alignment of long reads produced by third-generation sequencing technologies, specifically PacBio's SMRT sequencing and Oxford Nanopore Technologies (ONT). The alignment tools Minimap2, GraphMap, lra, lordfast, minialign, ngmlr and Bowtie2 were studied, and their performance compared on datasets obtained through the popular long-read simulators SimLoRD and PBSim2. The experiments consisted on many simulations involving a wide range of parameter settings to mimic specific error profiles characteristic of sequencing technologies, and using various types of reference files. The quality of the alignments was assessed by metrics such as Precision, Recall and Accuracy with respect to ground-truths generated and tagged by the simulations. In the analysis of the results, relationships with specific characteristics of the reference files are highlighted, as well as the specific types of simulation parameters used. To complete the analysis, statistics on the quality of the alignments, based on the de facto standard samtools, are also compared, as well as computational resource consumption, and performance efficiency metrics such as throughput and memory usage.
The thesis focuses on evaluating and benchmarking various tools used in the alignment of long reads produced by third-generation sequencing technologies, specifically PacBio's SMRT sequencing and Oxford Nanopore Technologies (ONT). The alignment tools Minimap2, GraphMap, lra, lordfast, minialign, ngmlr and Bowtie2 were studied, and their performance compared on datasets obtained through the popular long-read simulators SimLoRD and PBSim2. The experiments consisted on many simulations involving a wide range of parameter settings to mimic specific error profiles characteristic of sequencing technologies, and using various types of reference files. The quality of the alignments was assessed by metrics such as Precision, Recall and Accuracy with respect to ground-truths generated and tagged by the simulations. In the analysis of the results, relationships with specific characteristics of the reference files are highlighted, as well as the specific types of simulation parameters used. To complete the analysis, statistics on the quality of the alignments, based on the de facto standard samtools, are also compared, as well as computational resource consumption, and performance efficiency metrics such as throughput and memory usage.
Benchmarking tools for long reads mapping
GASPARDO, RUDI
2023/2024
Abstract
The thesis focuses on evaluating and benchmarking various tools used in the alignment of long reads produced by third-generation sequencing technologies, specifically PacBio's SMRT sequencing and Oxford Nanopore Technologies (ONT). The alignment tools Minimap2, GraphMap, lra, lordfast, minialign, ngmlr and Bowtie2 were studied, and their performance compared on datasets obtained through the popular long-read simulators SimLoRD and PBSim2. The experiments consisted on many simulations involving a wide range of parameter settings to mimic specific error profiles characteristic of sequencing technologies, and using various types of reference files. The quality of the alignments was assessed by metrics such as Precision, Recall and Accuracy with respect to ground-truths generated and tagged by the simulations. In the analysis of the results, relationships with specific characteristics of the reference files are highlighted, as well as the specific types of simulation parameters used. To complete the analysis, statistics on the quality of the alignments, based on the de facto standard samtools, are also compared, as well as computational resource consumption, and performance efficiency metrics such as throughput and memory usage.| File | Dimensione | Formato | |
|---|---|---|---|
|
Gaspardo_Rudi.pdf
Accesso riservato
Dimensione
10.93 MB
Formato
Adobe PDF
|
10.93 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/80166