The human gut microbiome is a central topic of research in bioinformatics and computational biology. The more and more widespread availability of high-throughput, whole-genome shotgun sequencing technologies makes it possible to obtain an unprecedented amount of genomic and metagenomic data. Various microbiota have been shown to correlate with health status and even with the early onset of diseases, the most prominent examples being Irritable Bowel Syndrome (IBS) and the more serious Inflammatory Bowel Diseases (IBD), such as Crohn's Disease (CD) and Ulcerative Colitis (UC). The desire naturally arises for predicting, based on the composition of the gut microbiome, whether patients suffer from or risk developing these conditions. This is especially valuable in the context of a newly-arising industry: holistic, personalized, preventive health consulting. Such a data-centric approach to assessing future health risks opens up the possibility for combining metagenomic information with results of other 'omics (metabolomics, genetics, proteomics, etc.), in order to provide patients with a comprehensive and easier-to-understand picture of the effects of, and the associations between, various aspects of their lifestyle. Most approaches to metagenomic analysis have so far focused on taxon-level resolution, quantifying taxa at the genus, species, or strain level. However, biological functions are not in a one-to-one correspondence with taxa: for instance, a certain species may (and does usually) fulfill more than one function, while the same function may be provided by more than one species. In this Thesis, we develop a metagenomic analysis pipeline based on individual genes and families of homologous genes. In contrast with using a large database of thousands of complete prokaryotic genomes, our approach only requires sequence data for select individual genes, therefore it is less storage-demanding. Relying on state-of-the-art read alignment software, we demonstrate gains in processing speed, too. Finally, we also show that our approach performs only slightly worse than SHOGUN, a more resource-intensive state-of-the-art metagenome analysis toolkit. We then apply both pipelines to several datasets and build machine learning models for classifying stool samples as either healthy or IBD/IBS. Finally, we analyze the association between gene families in both groups using the tools of network science applied to abundance correlation networks.

Towards personalized disease risk prediction from metagenome analysis of the microbiome

Goretity, Árpád
2020/2021

Abstract

The human gut microbiome is a central topic of research in bioinformatics and computational biology. The more and more widespread availability of high-throughput, whole-genome shotgun sequencing technologies makes it possible to obtain an unprecedented amount of genomic and metagenomic data. Various microbiota have been shown to correlate with health status and even with the early onset of diseases, the most prominent examples being Irritable Bowel Syndrome (IBS) and the more serious Inflammatory Bowel Diseases (IBD), such as Crohn's Disease (CD) and Ulcerative Colitis (UC). The desire naturally arises for predicting, based on the composition of the gut microbiome, whether patients suffer from or risk developing these conditions. This is especially valuable in the context of a newly-arising industry: holistic, personalized, preventive health consulting. Such a data-centric approach to assessing future health risks opens up the possibility for combining metagenomic information with results of other 'omics (metabolomics, genetics, proteomics, etc.), in order to provide patients with a comprehensive and easier-to-understand picture of the effects of, and the associations between, various aspects of their lifestyle. Most approaches to metagenomic analysis have so far focused on taxon-level resolution, quantifying taxa at the genus, species, or strain level. However, biological functions are not in a one-to-one correspondence with taxa: for instance, a certain species may (and does usually) fulfill more than one function, while the same function may be provided by more than one species. In this Thesis, we develop a metagenomic analysis pipeline based on individual genes and families of homologous genes. In contrast with using a large database of thousands of complete prokaryotic genomes, our approach only requires sequence data for select individual genes, therefore it is less storage-demanding. Relying on state-of-the-art read alignment software, we demonstrate gains in processing speed, too. Finally, we also show that our approach performs only slightly worse than SHOGUN, a more resource-intensive state-of-the-art metagenome analysis toolkit. We then apply both pipelines to several datasets and build machine learning models for classifying stool samples as either healthy or IBD/IBS. Finally, we analyze the association between gene families in both groups using the tools of network science applied to abundance correlation networks.
2020-07-20
73
metagenome, microbiome, preventive medicine, IBD
File in questo prodotto:
File Dimensione Formato  
tesi_GoretityDef.pdf

accesso aperto

Dimensione 4.15 MB
Formato Adobe PDF
4.15 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/21447