Presentazione e confronto di metodi per l'archiviazione efficiente di dati per la bioinformatica

In recent decades, new technologies,that make it possible to computerize the genomes studied in the laboratory, have been developed. However, the sequencing data are rapidly filling public databases. It is therefore the purpose of many modern tools to save in an efficient way this files. In these analysis we will compare two promising tools for compressing this data. These tools operate on k-mer, a tool designed for analyzing fasta files, the output of new sequencing techniques. As we will see, between the Counting de Bruijn Graph method and the UST tool, the latter will prove to be the most efficient at saving genetic data. These analyzes, however, have the sole purpose of comparing the size of the output files and do not consider other aspects of these tools. It is therefore correct to say that this analysis is a partial one.

Negli ultimi decenni sono state sviluppate nuove tecnologie che permettono di informatizzare i genomi studiati in laboratorio. I dati sequenziati, però, stanno rapidamente riempiendo i database pubblici. È quindi scopo di molti tool moderni quello di salvare in modo efficiente questi dati. In queste analisi andremo a confrontare due tool promettenti per la compressione di questi dati. Questi tool operano sui k-mer, strumento pensato per l’analisi dei file fasta, output delle nuove tecniche di sequenziamento. Come vedremo, tra il metodo Counting de Bruijn Graph e il tool UST, sarà quest’ultimo a rivelarsi il più efficiente a salvare i dati genetici. Queste analisi, però, hanno il solo scopo di confrontare la dimensione dei file in output e non considerano altri aspetti di questi tool. È quindi corretto affermare che questa analisi sia parziale.

Presentazione e confronto di metodi per l'archiviazione efficiente di dati per la bioinformatica

SALVIATI, UMBERTO

2021/2022

Abstract

In recent decades, new technologies,that make it possible to computerize the genomes studied in the laboratory, have been developed. However, the sequencing data are rapidly filling public databases. It is therefore the purpose of many modern tools to save in an efficient way this files. In these analysis we will compare two promising tools for compressing this data. These tools operate on k-mer, a tool designed for analyzing fasta files, the output of new sequencing techniques. As we will see, between the Counting de Bruijn Graph method and the UST tool, the latter will prove to be the most efficient at saving genetic data. These analyzes, however, have the sole purpose of comparing the size of the output files and do not consider other aspects of these tools. It is therefore correct to say that this analysis is a partial one.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				INGEGNERIA INFORMATICA Laurea di Primo Livello (D.M. 270/2004)
			
	Anno Accademico
	
				2021
			
	Titolo inglese
	
				Presentation and comparison of methods for efficient data storage for bioinformatics
			
	Abstract in italiano
	
				Negli ultimi decenni sono state sviluppate nuove tecnologie che permettono di informatizzare i genomi studiati in laboratorio. I dati sequenziati, però, stanno rapidamente riempiendo i database pubblici. È quindi scopo di molti tool moderni quello di salvare in modo 
 efficiente questi dati. In queste analisi andremo a confrontare due tool promettenti per la compressione di questi dati. Questi tool operano sui k-mer, strumento pensato per l’analisi dei file fasta, output delle nuove tecniche di sequenziamento. Come vedremo, tra il metodo Counting de Bruijn Graph e il tool UST, sarà quest’ultimo a rivelarsi il più efficiente a salvare i dati genetici. Queste analisi, però, hanno il solo scopo di confrontare la dimensione dei file in output e non considerano altri aspetti di questi tool. È quindi corretto affermare che questa analisi sia parziale.
			
	Parola chiave
	
				bioinformatica
dati
compressione
efficenza
			
	Relatore
	
				COMIN, MATTEO
			
	Appare nelle tipologie:
	
				Lauree triennali

File in questo prodotto:

File	Dimensione	Formato
Umberto_Salviati.pdf accesso aperto Dimensione 1.04 MB Formato Adobe PDF Visualizza/Apri	1.04 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/34670