Machine Learning Approaches for Gene Regulatory Network Inference Using Single-Cell RNA Sequencing Data

Gene Regulatory Networks (GRNs) are essential for understanding the molecular interactions that drive biological processes, from development and metabolism to disease progression. GRNs can be represented as directed networks (or graphs), where nodes correspond to genes and directed edges indicate regulatory interactions between genes. GRN inference, the process of reconstructing these networks, has traditionally been performed using bulk RNA-sequencing (RNA-seq) data. However, the rise of single-cell RNA-sequencing (scRNA-seq) has introduced new opportunities and challenges, enabling the exploration of cellular heterogeneity at unprecedented resolution but also introducing significant technical noise and variability. Machine learning methods for GRN inference from scRNA-seq data can be classified into unsupervised and supervised approaches. Unsupervised methods identify regulatory interactions without prior knowledge of gene pairs, while supervised approaches rely on known networks to train models that predict gene interactions. This thesis investigates different machine learning approaches—GENIE3 and scGeneRAI, which are unsupervised, and GNNLink and STGRNS, which are supervised—for GRN inference using scRNA-seq data. While the performance of all methods is evaluated, specific enhancements were applied to GNNLink. These enhancements include using transcription factor frequency lookup tables to improve performance and creating an unsupervised version of GNNLink by leveraging only expression data to generate the training set based on Pearson correlation between genes. Additionally, irrelevant genes are filtered out from both the unsupervised approaches and the training set for the unsupervised version of GNNLink, ensuring that the predictions are more relevant to the various ground-truth networks against which the inferred GRNs can be evaluated. By refining these computational methods, this research aims to improve the reliability and applicability of GRN inference across diverse biological contexts.

Machine Learning Approaches for Gene Regulatory Network Inference Using Single-Cell RNA Sequencing Data

ZIVKOVIC, BOGDANA

2023/2024

Abstract

Gene Regulatory Networks (GRNs) are essential for understanding the molecular interactions that drive biological processes, from development and metabolism to disease progression. GRNs can be represented as directed networks (or graphs), where nodes correspond to genes and directed edges indicate regulatory interactions between genes. GRN inference, the process of reconstructing these networks, has traditionally been performed using bulk RNA-sequencing (RNA-seq) data. However, the rise of single-cell RNA-sequencing (scRNA-seq) has introduced new opportunities and challenges, enabling the exploration of cellular heterogeneity at unprecedented resolution but also introducing significant technical noise and variability. Machine learning methods for GRN inference from scRNA-seq data can be classified into unsupervised and supervised approaches. Unsupervised methods identify regulatory interactions without prior knowledge of gene pairs, while supervised approaches rely on known networks to train models that predict gene interactions. This thesis investigates different machine learning approaches—GENIE3 and scGeneRAI, which are unsupervised, and GNNLink and STGRNS, which are supervised—for GRN inference using scRNA-seq data. While the performance of all methods is evaluated, specific enhancements were applied to GNNLink. These enhancements include using transcription factor frequency lookup tables to improve performance and creating an unsupervised version of GNNLink by leveraging only expression data to generate the training set based on Pearson correlation between genes. Additionally, irrelevant genes are filtered out from both the unsupervised approaches and the training set for the unsupervised version of GNNLink, ensuring that the predictions are more relevant to the various ground-truth networks against which the inferred GRNs can be evaluated. By refining these computational methods, this research aims to improve the reliability and applicability of GRN inference across diverse biological contexts.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				DATA SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Machine Learning Approaches for Gene Regulatory Network Inference Using Single-Cell RNA Sequencing Data
			
	Abstract in italiano
	
				Gene Regulatory Networks (GRNs) are essential for understanding the molecular interactions that drive biological processes, from development and metabolism to disease progression. GRNs can be represented as directed networks (or graphs), where nodes correspond to genes and directed edges indicate regulatory interactions between genes. GRN inference, the process of reconstructing these networks, has traditionally been performed using bulk RNA-sequencing (RNA-seq) data. However, the rise of single-cell RNA-sequencing (scRNA-seq) has introduced new opportunities and challenges, enabling the exploration of cellular heterogeneity at unprecedented resolution but also introducing significant technical noise and variability. Machine learning methods for GRN inference from scRNA-seq data can be classified into unsupervised and supervised approaches. Unsupervised methods identify regulatory interactions without prior knowledge of gene pairs, while supervised approaches rely on known networks to train models that predict gene interactions. This thesis investigates different machine learning approaches—GENIE3 and scGeneRAI, which are unsupervised, and GNNLink and STGRNS, which are supervised—for GRN inference using scRNA-seq data. While the performance of all methods is evaluated, specific enhancements were applied to GNNLink. These enhancements include using transcription factor frequency lookup tables to improve performance and creating an unsupervised version of GNNLink by leveraging only expression data to generate the training set based on Pearson correlation between genes. Additionally, irrelevant genes are filtered out from both the unsupervised approaches and the training set for the unsupervised version of GNNLink, ensuring that the predictions are more relevant to the various ground-truth networks against which the inferred GRNs can be evaluated. By refining these computational methods, this research aims to improve the reliability and applicability of GRN inference across diverse biological contexts.
			
	Parola chiave
	
				GRN Inference
Expression Data
Machine Learning
Biological Data
Data Analysis
			
	Relatore
	
				NAVARIN, NICOLO'
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Data_Science_MsC_Thesis____UniPD-Final Thesis.pdf accesso aperto Dimensione 4.39 MB Formato Adobe PDF Visualizza/Apri	4.39 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/80908