In recent decades, new technologies,that make it possible to computerize the genomes studied in the laboratory, have been developed. However, the sequencing data are rapidly filling public databases. It is therefore the purpose of many modern tools to save in an efficient way this files. In these analysis we will compare two promising tools for compressing this data. These tools operate on k-mer, a tool designed for analyzing fasta files, the output of new sequencing techniques. As we will see, between the Counting de Bruijn Graph method and the UST tool, the latter will prove to be the most efficient at saving genetic data. These analyzes, however, have the sole purpose of comparing the size of the output files and do not consider other aspects of these tools. It is therefore correct to say that this analysis is a partial one.
Negli ultimi decenni sono state sviluppate nuove tecnologie che permettono di informatizzare i genomi studiati in laboratorio. I dati sequenziati, però, stanno rapidamente riempiendo i database pubblici. È quindi scopo di molti tool moderni quello di salvare in modo efficiente questi dati. In queste analisi andremo a confrontare due tool promettenti per la compressione di questi dati. Questi tool operano sui k-mer, strumento pensato per l’analisi dei file fasta, output delle nuove tecniche di sequenziamento. Come vedremo, tra il metodo Counting de Bruijn Graph e il tool UST, sarà quest’ultimo a rivelarsi il più efficiente a salvare i dati genetici. Queste analisi, però, hanno il solo scopo di confrontare la dimensione dei file in output e non considerano altri aspetti di questi tool. È quindi corretto affermare che questa analisi sia parziale.
Presentazione e confronto di metodi per l'archiviazione efficiente di dati per la bioinformatica
SALVIATI, UMBERTO
2021/2022
Abstract
In recent decades, new technologies,that make it possible to computerize the genomes studied in the laboratory, have been developed. However, the sequencing data are rapidly filling public databases. It is therefore the purpose of many modern tools to save in an efficient way this files. In these analysis we will compare two promising tools for compressing this data. These tools operate on k-mer, a tool designed for analyzing fasta files, the output of new sequencing techniques. As we will see, between the Counting de Bruijn Graph method and the UST tool, the latter will prove to be the most efficient at saving genetic data. These analyzes, however, have the sole purpose of comparing the size of the output files and do not consider other aspects of these tools. It is therefore correct to say that this analysis is a partial one.File | Dimensione | Formato | |
---|---|---|---|
Umberto_Salviati.pdf
accesso aperto
Dimensione
1.04 MB
Formato
Adobe PDF
|
1.04 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/34670