Proteins play a vital role in various biological processes and understanding their functions is crucial for advancing our knowledge of cellular mechanisms and developing targeted therapies. However, a large proportion of proteins lack experimental functional annotations due to the high cost and time constraints of traditional biochemical experiments. To bridge this gap, computational strategies for predicting protein functions have gained prominence. In this thesis, we propose a novel approach for protein function prediction that combines the use of a graph neural network model with protein-protein interaction (PPI) networks and embeddings generated by a self-supervised pre-trained language model. Our model exploits the rich information provided by the embeddings and leverages the topological information of proteins in the network. We employ a dataset of 208,578 proteins, the largest dataset compared to previous works, to train and evaluate our model. Additionally, we address the challenge of scaling graph neural networks to large networks by utilizing subgraph sampling techniques.

Proteins play a vital role in various biological processes and understanding their functions is crucial for advancing our knowledge of cellular mechanisms and developing targeted therapies. However, a large proportion of proteins lack experimental functional annotations due to the high cost and time constraints of traditional biochemical experiments. To bridge this gap, computational strategies for predicting protein functions have gained prominence. In this thesis, we propose a novel approach for protein function prediction that combines the use of a graph neural network model with protein-protein interaction (PPI) networks and embeddings generated by a self-supervised pre-trained language model. Our model exploits the rich information provided by the embeddings and leverages the topological information of proteins in the network. We employ a dataset of 208,578 proteins, the largest dataset compared to previous works, to train and evaluate our model. Additionally, we address the challenge of scaling graph neural networks to large networks by utilizing subgraph sampling techniques.

Enhancing Protein Function Prediction: Leveraging Structured Data and Multi-Label Classification

MEHDIABADI, MAHTA
2022/2023

Abstract

Proteins play a vital role in various biological processes and understanding their functions is crucial for advancing our knowledge of cellular mechanisms and developing targeted therapies. However, a large proportion of proteins lack experimental functional annotations due to the high cost and time constraints of traditional biochemical experiments. To bridge this gap, computational strategies for predicting protein functions have gained prominence. In this thesis, we propose a novel approach for protein function prediction that combines the use of a graph neural network model with protein-protein interaction (PPI) networks and embeddings generated by a self-supervised pre-trained language model. Our model exploits the rich information provided by the embeddings and leverages the topological information of proteins in the network. We employ a dataset of 208,578 proteins, the largest dataset compared to previous works, to train and evaluate our model. Additionally, we address the challenge of scaling graph neural networks to large networks by utilizing subgraph sampling techniques.
2022
Enhancing Protein Function Prediction: Leveraging Structured Data and Multi-Label Classification
Proteins play a vital role in various biological processes and understanding their functions is crucial for advancing our knowledge of cellular mechanisms and developing targeted therapies. However, a large proportion of proteins lack experimental functional annotations due to the high cost and time constraints of traditional biochemical experiments. To bridge this gap, computational strategies for predicting protein functions have gained prominence. In this thesis, we propose a novel approach for protein function prediction that combines the use of a graph neural network model with protein-protein interaction (PPI) networks and embeddings generated by a self-supervised pre-trained language model. Our model exploits the rich information provided by the embeddings and leverages the topological information of proteins in the network. We employ a dataset of 208,578 proteins, the largest dataset compared to previous works, to train and evaluate our model. Additionally, we address the challenge of scaling graph neural networks to large networks by utilizing subgraph sampling techniques.
Protein Function
Deep Learning
Gene Ontology
File in questo prodotto:
File Dimensione Formato  
MEHDIABADI_MAHTA.pdf

accesso riservato

Dimensione 1.85 MB
Formato Adobe PDF
1.85 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/52325