Single-Cell Proteomics (SCP) is a scientific field focused on studying protein expressions in individual cells. Over the past decade, SCP has gained importance over genome and transcriptome studies due to the functional role of proteins, whose expression levels could provide interesting insights when comparing cells under different conditions, cell types and other variables. However, SCP faces major challenges related both to the nature of this data and to the difficulty of its acquisition. One of the most critical issues is data sparsity or missingness, aggravated by the limitations of the current available imputation techniques, often generating considerable biases. Traditional omics analysis usually employes data modelling to identify differentially expressed proteins across conditions, followed by a Gene Set Analysis to give the results a biological interpretation. The main objective of this thesis is to explore and discuss the use of clustering, biclustering and community detection methods to identify modules of co-expressed proteins in cells of the same type (from two example datasets), as an alternative to univariate protein modelling. The main algorithms used are k-means, QUBIC and Leiden community detection. Furthermore, a statistical test based on a non-parametric null distribution of cluster silhouettes is implemented in order to identify the most relevant clusters. The relevant clusters reported from the different methods are then enriched via Fisher over-representation test and compared to evaluate their biological significance.

Co-expressed protein modules in single-cell proteomics: an exploratory method comparison

MENNA, EMMA
2024/2025

Abstract

Single-Cell Proteomics (SCP) is a scientific field focused on studying protein expressions in individual cells. Over the past decade, SCP has gained importance over genome and transcriptome studies due to the functional role of proteins, whose expression levels could provide interesting insights when comparing cells under different conditions, cell types and other variables. However, SCP faces major challenges related both to the nature of this data and to the difficulty of its acquisition. One of the most critical issues is data sparsity or missingness, aggravated by the limitations of the current available imputation techniques, often generating considerable biases. Traditional omics analysis usually employes data modelling to identify differentially expressed proteins across conditions, followed by a Gene Set Analysis to give the results a biological interpretation. The main objective of this thesis is to explore and discuss the use of clustering, biclustering and community detection methods to identify modules of co-expressed proteins in cells of the same type (from two example datasets), as an alternative to univariate protein modelling. The main algorithms used are k-means, QUBIC and Leiden community detection. Furthermore, a statistical test based on a non-parametric null distribution of cluster silhouettes is implemented in order to identify the most relevant clusters. The relevant clusters reported from the different methods are then enriched via Fisher over-representation test and compared to evaluate their biological significance.
2024
Co-expressed protein modules in single-cell proteomics: an exploratory method comparison
Proteomics
Clustering methods
Community detection
File in questo prodotto:
File Dimensione Formato  
Menna_Emma.pdf

accesso aperto

Dimensione 3.82 MB
Formato Adobe PDF
3.82 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/84087