Gene Regulatory Networks (GRNs) are essential for understanding the molecular interactions that drive biological processes, from development and metabolism to disease progression. GRNs can be represented as directed networks (or graphs), where nodes correspond to genes and directed edges indicate regulatory interactions between genes. GRN inference, the process of reconstructing these networks, has traditionally been performed using bulk RNA-sequencing (RNA-seq) data. However, the rise of single-cell RNA-sequencing (scRNA-seq) has introduced new opportunities and challenges, enabling the exploration of cellular heterogeneity at unprecedented resolution but also introducing significant technical noise and variability. Machine learning methods for GRN inference from scRNA-seq data can be classified into unsupervised and supervised approaches. Unsupervised methods identify regulatory interactions without prior knowledge of gene pairs, while supervised approaches rely on known networks to train models that predict gene interactions. This thesis investigates different machine learning approaches—GENIE3 and scGeneRAI, which are unsupervised, and GNNLink and STGRNS, which are supervised—for GRN inference using scRNA-seq data. While the performance of all methods is evaluated, specific enhancements were applied to GNNLink. These enhancements include using transcription factor frequency lookup tables to improve performance and creating an unsupervised version of GNNLink by leveraging only expression data to generate the training set based on Pearson correlation between genes. Additionally, irrelevant genes are filtered out from both the unsupervised approaches and the training set for the unsupervised version of GNNLink, ensuring that the predictions are more relevant to the various ground-truth networks against which the inferred GRNs can be evaluated. By refining these computational methods, this research aims to improve the reliability and applicability of GRN inference across diverse biological contexts.
Gene Regulatory Networks (GRNs) are essential for understanding the molecular interactions that drive biological processes, from development and metabolism to disease progression. GRNs can be represented as directed networks (or graphs), where nodes correspond to genes and directed edges indicate regulatory interactions between genes. GRN inference, the process of reconstructing these networks, has traditionally been performed using bulk RNA-sequencing (RNA-seq) data. However, the rise of single-cell RNA-sequencing (scRNA-seq) has introduced new opportunities and challenges, enabling the exploration of cellular heterogeneity at unprecedented resolution but also introducing significant technical noise and variability. Machine learning methods for GRN inference from scRNA-seq data can be classified into unsupervised and supervised approaches. Unsupervised methods identify regulatory interactions without prior knowledge of gene pairs, while supervised approaches rely on known networks to train models that predict gene interactions. This thesis investigates different machine learning approaches—GENIE3 and scGeneRAI, which are unsupervised, and GNNLink and STGRNS, which are supervised—for GRN inference using scRNA-seq data. While the performance of all methods is evaluated, specific enhancements were applied to GNNLink. These enhancements include using transcription factor frequency lookup tables to improve performance and creating an unsupervised version of GNNLink by leveraging only expression data to generate the training set based on Pearson correlation between genes. Additionally, irrelevant genes are filtered out from both the unsupervised approaches and the training set for the unsupervised version of GNNLink, ensuring that the predictions are more relevant to the various ground-truth networks against which the inferred GRNs can be evaluated. By refining these computational methods, this research aims to improve the reliability and applicability of GRN inference across diverse biological contexts.
Machine Learning Approaches for Gene Regulatory Network Inference Using Single-Cell RNA Sequencing Data
ZIVKOVIC, BOGDANA
2023/2024
Abstract
Gene Regulatory Networks (GRNs) are essential for understanding the molecular interactions that drive biological processes, from development and metabolism to disease progression. GRNs can be represented as directed networks (or graphs), where nodes correspond to genes and directed edges indicate regulatory interactions between genes. GRN inference, the process of reconstructing these networks, has traditionally been performed using bulk RNA-sequencing (RNA-seq) data. However, the rise of single-cell RNA-sequencing (scRNA-seq) has introduced new opportunities and challenges, enabling the exploration of cellular heterogeneity at unprecedented resolution but also introducing significant technical noise and variability. Machine learning methods for GRN inference from scRNA-seq data can be classified into unsupervised and supervised approaches. Unsupervised methods identify regulatory interactions without prior knowledge of gene pairs, while supervised approaches rely on known networks to train models that predict gene interactions. This thesis investigates different machine learning approaches—GENIE3 and scGeneRAI, which are unsupervised, and GNNLink and STGRNS, which are supervised—for GRN inference using scRNA-seq data. While the performance of all methods is evaluated, specific enhancements were applied to GNNLink. These enhancements include using transcription factor frequency lookup tables to improve performance and creating an unsupervised version of GNNLink by leveraging only expression data to generate the training set based on Pearson correlation between genes. Additionally, irrelevant genes are filtered out from both the unsupervised approaches and the training set for the unsupervised version of GNNLink, ensuring that the predictions are more relevant to the various ground-truth networks against which the inferred GRNs can be evaluated. By refining these computational methods, this research aims to improve the reliability and applicability of GRN inference across diverse biological contexts.File | Dimensione | Formato | |
---|---|---|---|
Data_Science_MsC_Thesis____UniPD-Final Thesis.pdf
accesso aperto
Dimensione
4.39 MB
Formato
Adobe PDF
|
4.39 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/80908