Benchmarking tools for long reads mapping

The thesis focuses on evaluating and benchmarking various tools used in the alignment of long reads produced by third-generation sequencing technologies, specifically PacBio's SMRT sequencing and Oxford Nanopore Technologies (ONT). The alignment tools Minimap2, GraphMap, lra, lordfast, minialign, ngmlr and Bowtie2 were studied, and their performance compared on datasets obtained through the popular long-read simulators SimLoRD and PBSim2. The experiments consisted on many simulations involving a wide range of parameter settings to mimic specific error profiles characteristic of sequencing technologies, and using various types of reference files. The quality of the alignments was assessed by metrics such as Precision, Recall and Accuracy with respect to ground-truths generated and tagged by the simulations. In the analysis of the results, relationships with specific characteristics of the reference files are highlighted, as well as the specific types of simulation parameters used. To complete the analysis, statistics on the quality of the alignments, based on the de facto standard samtools, are also compared, as well as computational resource consumption, and performance efficiency metrics such as throughput and memory usage.

Benchmarking tools for long reads mapping

GASPARDO, RUDI

2023/2024

Abstract

The thesis focuses on evaluating and benchmarking various tools used in the alignment of long reads produced by third-generation sequencing technologies, specifically PacBio's SMRT sequencing and Oxford Nanopore Technologies (ONT). The alignment tools Minimap2, GraphMap, lra, lordfast, minialign, ngmlr and Bowtie2 were studied, and their performance compared on datasets obtained through the popular long-read simulators SimLoRD and PBSim2. The experiments consisted on many simulations involving a wide range of parameter settings to mimic specific error profiles characteristic of sequencing technologies, and using various types of reference files. The quality of the alignments was assessed by metrics such as Precision, Recall and Accuracy with respect to ground-truths generated and tagged by the simulations. In the analysis of the results, relationships with specific characteristics of the reference files are highlighted, as well as the specific types of simulation parameters used. To complete the analysis, statistics on the quality of the alignments, based on the de facto standard samtools, are also compared, as well as computational resource consumption, and performance efficiency metrics such as throughput and memory usage.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				INGEGNERIA INFORMATICA Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Benchmarking tools for long reads mapping
			
	Abstract in italiano
	
				The thesis focuses on evaluating and benchmarking various tools used 
in the alignment of long reads produced by third-generation sequencing 
technologies, specifically PacBio's SMRT sequencing and Oxford 
Nanopore Technologies (ONT). The alignment tools Minimap2, GraphMap, 
lra, lordfast, minialign, ngmlr and Bowtie2 were studied, and their 
performance compared on datasets obtained through the popular 
long-read simulators SimLoRD and PBSim2.
The experiments consisted on many simulations  involving a wide range 
of parameter settings to mimic specific error profiles characteristic 
of sequencing technologies, and using various types of reference files. 
The quality of the alignments was assessed by metrics such as 
Precision, Recall and Accuracy with respect to ground-truths generated 
and tagged by the simulations. In the analysis of the results, 
relationships with specific characteristics of the reference files are 
highlighted, as well as the specific types of simulation parameters 
used.  To complete the analysis, statistics on the quality of the 
alignments, based on the de facto standard samtools, are also 
compared, as well as computational resource consumption, and 
performance efficiency metrics such as throughput and memory usage.
			
	Parola chiave
	
				long reads mapping
alignment
bioinformatics
			
	Relatore
	
				PIZZI, CINZIA
			
	Correlatore
	
				COMIN, MATTEO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Gaspardo_Rudi.pdf Accesso riservato Dimensione 10.93 MB Formato Adobe PDF	10.93 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/80166