In this thesis we study the statistical properties of some families of motifs of the same length. We develop a method for the approximation of the average number of frequent motifs in the family in random texts with independent characters. We give a bound on the approximation error and show that this bound is loose in practice. We develop a test which verifies whether the number of frequent motifs can be approximated to a Poisson distribution

On the Discovery of Significant Motifs in Genomic Sequences

Boscariol, Matteo
2013/2014

Abstract

In this thesis we study the statistical properties of some families of motifs of the same length. We develop a method for the approximation of the average number of frequent motifs in the family in random texts with independent characters. We give a bound on the approximation error and show that this bound is loose in practice. We develop a test which verifies whether the number of frequent motifs can be approximated to a Poisson distribution
2013-12-10
data mining, genomic sequences, statistical significance, motifs, occurrence probability
File in questo prodotto:
File Dimensione Formato  
thesis.pdf

accesso aperto

Dimensione 746.57 kB
Formato Adobe PDF
746.57 kB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/17709