With this work we want to find an efficient way to compress k-mers sets with counters since they take up a lot of disk space but their use brings several advantages over genomes or sets of genomes. Here some strategies are pro- posed to explore the cdBGs in order to produce smaller files than UST and the counts encoding has been revised. A new application has been presented to implement the above strategies and fix a bug in UST which caused wrong counts ordering. It has been shown that it is possible to improve the com- pression with respect to UST based on the density of the graph. Finally, a small value of k leads to denser graphs and therefore better results.

With this work we want to find an efficient way to compress k-mers sets with counters since they take up a lot of disk space but their use brings several advantages over genomes or sets of genomes. Here some strategies are pro- posed to explore the cdBGs in order to produce smaller files than UST and the counts encoding has been revised. A new application has been presented to implement the above strategies and fix a bug in UST which caused wrong counts ordering. It has been shown that it is possible to improve the com- pression with respect to UST based on the density of the graph. Finally, a small value of k leads to denser graphs and therefore better results.

Methods for compressing k-mers set with counters

ROSSIGNOLO, ENRICO
2022/2023

Abstract

With this work we want to find an efficient way to compress k-mers sets with counters since they take up a lot of disk space but their use brings several advantages over genomes or sets of genomes. Here some strategies are pro- posed to explore the cdBGs in order to produce smaller files than UST and the counts encoding has been revised. A new application has been presented to implement the above strategies and fix a bug in UST which caused wrong counts ordering. It has been shown that it is possible to improve the com- pression with respect to UST based on the density of the graph. Finally, a small value of k leads to denser graphs and therefore better results.
2022
Methods for compressing k-mers set with counters
With this work we want to find an efficient way to compress k-mers sets with counters since they take up a lot of disk space but their use brings several advantages over genomes or sets of genomes. Here some strategies are pro- posed to explore the cdBGs in order to produce smaller files than UST and the counts encoding has been revised. A new application has been presented to implement the above strategies and fix a bug in UST which caused wrong counts ordering. It has been shown that it is possible to improve the com- pression with respect to UST based on the density of the graph. Finally, a small value of k leads to denser graphs and therefore better results.
compression
kmers
set
File in questo prodotto:
File Dimensione Formato  
Rossignolo_Enrico.pdf

accesso aperto

Dimensione 4.08 MB
Formato Adobe PDF
4.08 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/45148