Gene Regulatory Networks (GRNs) are essential for understanding the molecular interactions that drive biological processes, from development and metabolism to disease progression. GRNs can be represented as directed networks (or graphs), where nodes correspond to genes and directed edges indicate regulatory interactions between genes. GRN inference, the process of reconstructing these networks, has traditionally been performed using bulk RNA-sequencing (RNA-seq) data. However, the rise of single-cell RNA-sequencing (scRNA-seq) has introduced new opportunities and challenges, enabling the exploration of cellular heterogeneity at unprecedented resolution but also introducing significant technical noise and variability. Machine learning methods for GRN inference from scRNA-seq data can be classified into unsupervised and supervised approaches. Unsupervised methods identify regulatory interactions without prior knowledge of gene pairs, while supervised approaches rely on known networks to train models that predict gene interactions. This thesis investigates different machine learning approaches—GENIE3 and scGeneRAI, which are unsupervised, and GNNLink and STGRNS, which are supervised—for GRN inference using scRNA-seq data. While the performance of all methods is evaluated, specific enhancements were applied to GNNLink. These enhancements include using transcription factor frequency lookup tables to improve performance and creating an unsupervised version of GNNLink by leveraging only expression data to generate the training set based on Pearson correlation between genes. Additionally, irrelevant genes are filtered out from both the unsupervised approaches and the training set for the unsupervised version of GNNLink, ensuring that the predictions are more relevant to the various ground-truth networks against which the inferred GRNs can be evaluated. By refining these computational methods, this research aims to improve the reliability and applicability of GRN inference across diverse biological contexts.

Gene Regulatory Networks (GRNs) are essential for understanding the molecular interactions that drive biological processes, from development and metabolism to disease progression. GRNs can be represented as directed networks (or graphs), where nodes correspond to genes and directed edges indicate regulatory interactions between genes. GRN inference, the process of reconstructing these networks, has traditionally been performed using bulk RNA-sequencing (RNA-seq) data. However, the rise of single-cell RNA-sequencing (scRNA-seq) has introduced new opportunities and challenges, enabling the exploration of cellular heterogeneity at unprecedented resolution but also introducing significant technical noise and variability. Machine learning methods for GRN inference from scRNA-seq data can be classified into unsupervised and supervised approaches. Unsupervised methods identify regulatory interactions without prior knowledge of gene pairs, while supervised approaches rely on known networks to train models that predict gene interactions. This thesis investigates different machine learning approaches—GENIE3 and scGeneRAI, which are unsupervised, and GNNLink and STGRNS, which are supervised—for GRN inference using scRNA-seq data. While the performance of all methods is evaluated, specific enhancements were applied to GNNLink. These enhancements include using transcription factor frequency lookup tables to improve performance and creating an unsupervised version of GNNLink by leveraging only expression data to generate the training set based on Pearson correlation between genes. Additionally, irrelevant genes are filtered out from both the unsupervised approaches and the training set for the unsupervised version of GNNLink, ensuring that the predictions are more relevant to the various ground-truth networks against which the inferred GRNs can be evaluated. By refining these computational methods, this research aims to improve the reliability and applicability of GRN inference across diverse biological contexts.

Machine Learning Approaches for Gene Regulatory Network Inference Using Single-Cell RNA Sequencing Data

ZIVKOVIC, BOGDANA
2023/2024

Abstract

Gene Regulatory Networks (GRNs) are essential for understanding the molecular interactions that drive biological processes, from development and metabolism to disease progression. GRNs can be represented as directed networks (or graphs), where nodes correspond to genes and directed edges indicate regulatory interactions between genes. GRN inference, the process of reconstructing these networks, has traditionally been performed using bulk RNA-sequencing (RNA-seq) data. However, the rise of single-cell RNA-sequencing (scRNA-seq) has introduced new opportunities and challenges, enabling the exploration of cellular heterogeneity at unprecedented resolution but also introducing significant technical noise and variability. Machine learning methods for GRN inference from scRNA-seq data can be classified into unsupervised and supervised approaches. Unsupervised methods identify regulatory interactions without prior knowledge of gene pairs, while supervised approaches rely on known networks to train models that predict gene interactions. This thesis investigates different machine learning approaches—GENIE3 and scGeneRAI, which are unsupervised, and GNNLink and STGRNS, which are supervised—for GRN inference using scRNA-seq data. While the performance of all methods is evaluated, specific enhancements were applied to GNNLink. These enhancements include using transcription factor frequency lookup tables to improve performance and creating an unsupervised version of GNNLink by leveraging only expression data to generate the training set based on Pearson correlation between genes. Additionally, irrelevant genes are filtered out from both the unsupervised approaches and the training set for the unsupervised version of GNNLink, ensuring that the predictions are more relevant to the various ground-truth networks against which the inferred GRNs can be evaluated. By refining these computational methods, this research aims to improve the reliability and applicability of GRN inference across diverse biological contexts.
2023
Machine Learning Approaches for Gene Regulatory Network Inference Using Single-Cell RNA Sequencing Data
Gene Regulatory Networks (GRNs) are essential for understanding the molecular interactions that drive biological processes, from development and metabolism to disease progression. GRNs can be represented as directed networks (or graphs), where nodes correspond to genes and directed edges indicate regulatory interactions between genes. GRN inference, the process of reconstructing these networks, has traditionally been performed using bulk RNA-sequencing (RNA-seq) data. However, the rise of single-cell RNA-sequencing (scRNA-seq) has introduced new opportunities and challenges, enabling the exploration of cellular heterogeneity at unprecedented resolution but also introducing significant technical noise and variability. Machine learning methods for GRN inference from scRNA-seq data can be classified into unsupervised and supervised approaches. Unsupervised methods identify regulatory interactions without prior knowledge of gene pairs, while supervised approaches rely on known networks to train models that predict gene interactions. This thesis investigates different machine learning approaches—GENIE3 and scGeneRAI, which are unsupervised, and GNNLink and STGRNS, which are supervised—for GRN inference using scRNA-seq data. While the performance of all methods is evaluated, specific enhancements were applied to GNNLink. These enhancements include using transcription factor frequency lookup tables to improve performance and creating an unsupervised version of GNNLink by leveraging only expression data to generate the training set based on Pearson correlation between genes. Additionally, irrelevant genes are filtered out from both the unsupervised approaches and the training set for the unsupervised version of GNNLink, ensuring that the predictions are more relevant to the various ground-truth networks against which the inferred GRNs can be evaluated. By refining these computational methods, this research aims to improve the reliability and applicability of GRN inference across diverse biological contexts.
GRN Inference
Expression Data
Machine Learning
Biological Data
Data Analysis
File in questo prodotto:
File Dimensione Formato  
Data_Science_MsC_Thesis____UniPD-Final Thesis.pdf

accesso aperto

Dimensione 4.39 MB
Formato Adobe PDF
4.39 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/80908