Coalescence theory is a foundational concept in population genetics that describes the ancestry of genes in a population. It assumes that genomes in a finite population are all clones of a common ancestor. Therefore, the time to the most recent common ancestor is dependent on the demographic history. Across generations, however, mutations accumulate giving rise to single nucleotide polymorphism in the observed genomes at present time. Hence two sampled genomes will actually differ in a few positions, these being called heterozygous sites. The number of mutations reflects the time separating the pair from their common ancestor. Recombination process allows an individual to have genomic regions which are clones of two different parent genomes, hence mixing multiple lineages along the genome. As a consequence, heterozygous sites along a diploid genome are not uniformly distributed. Rather, their density varies as a result of recombination events, and their local density reflects the time to the last common ancestor of the maternal and paternal copies of a genomic region. The distribution of the density of heterozygous sites therefore carries information about the history of population size. Despite previous efforts, an exact derivation of the distribution of heterozygous sites is still lacking. As a consequence, estimating population size variation is difficult and requires several simplifying assumptions. Recently, new approaches have allowed the derivation of key quantities that allow to describe the distribution of heterozygous sites. The theory accounts for arbitrary demographic histories, including bottlenecks, and more general scenarios where population size is constant over several epochs. The thesis explores some special solutions to the model’s equation, as well as the most general one, comparing the results with existing treatments of specific scenarios. We also show that the theory reproduces the data obtained from the simulations of human demographic models. Other than differential equations, you will exploit computational tools such as Python and Julia programming languages as well as more specific simulation engines.
Coalescence theory is a foundational concept in population genetics that describes the ancestry of genes in a population. It assumes that genomes in a finite population are all clones of a common ancestor. Therefore, the time to the most recent common ancestor is dependent on the demographic history. Across generations, however, mutations accumulate giving rise to single nucleotide polymorphism in the observed genomes at present time. Hence two sampled genomes will actually differ in a few positions, these being called heterozygous sites. The number of mutations reflects the time separating the pair from their common ancestor. Recombination process allows an individual to have genomic regions which are clones of two different parent genomes, hence mixing multiple lineages along the genome. As a consequence, heterozygous sites along a diploid genome are not uniformly distributed. Rather, their density varies as a result of recombination events, and their local density reflects the time to the last common ancestor of the maternal and paternal copies of a genomic region. The distribution of the density of heterozygous sites therefore carries information about the history of population size. Despite previous efforts, an exact derivation of the distribution of heterozygous sites is still lacking. As a consequence, estimating population size variation is difficult and requires several simplifying assumptions. Recently, new approaches have allowed the derivation of key quantities that allow to describe the distribution of heterozygous sites. The theory accounts for arbitrary demographic histories, including bottlenecks, and more general scenarios where population size is constant over several epochs. The thesis explores some special solutions to the model’s equation, as well as the most general one, comparing the results with existing treatments of specific scenarios. We also show that the theory reproduces the data obtained from the simulations of human demographic models. Other than differential equations, you will exploit computational tools such as Python and Julia programming languages as well as more specific simulation engines.
Analytical Solutions for Genomic Quantities in Finite Populations with Recombination
STENTELLA, TOMMASO
2023/2024
Abstract
Coalescence theory is a foundational concept in population genetics that describes the ancestry of genes in a population. It assumes that genomes in a finite population are all clones of a common ancestor. Therefore, the time to the most recent common ancestor is dependent on the demographic history. Across generations, however, mutations accumulate giving rise to single nucleotide polymorphism in the observed genomes at present time. Hence two sampled genomes will actually differ in a few positions, these being called heterozygous sites. The number of mutations reflects the time separating the pair from their common ancestor. Recombination process allows an individual to have genomic regions which are clones of two different parent genomes, hence mixing multiple lineages along the genome. As a consequence, heterozygous sites along a diploid genome are not uniformly distributed. Rather, their density varies as a result of recombination events, and their local density reflects the time to the last common ancestor of the maternal and paternal copies of a genomic region. The distribution of the density of heterozygous sites therefore carries information about the history of population size. Despite previous efforts, an exact derivation of the distribution of heterozygous sites is still lacking. As a consequence, estimating population size variation is difficult and requires several simplifying assumptions. Recently, new approaches have allowed the derivation of key quantities that allow to describe the distribution of heterozygous sites. The theory accounts for arbitrary demographic histories, including bottlenecks, and more general scenarios where population size is constant over several epochs. The thesis explores some special solutions to the model’s equation, as well as the most general one, comparing the results with existing treatments of specific scenarios. We also show that the theory reproduces the data obtained from the simulations of human demographic models. Other than differential equations, you will exploit computational tools such as Python and Julia programming languages as well as more specific simulation engines.File | Dimensione | Formato | |
---|---|---|---|
stentella_tommaso.pdf
accesso riservato
Dimensione
2.8 MB
Formato
Adobe PDF
|
2.8 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/64901