Single-Cell Proteomics (SCP) is a scientific field focused on studying protein expressions in individual cells. Over the past decade, SCP has gained importance over genome and transcriptome studies due to the functional role of proteins, whose expression levels could provide interesting insights when comparing cells under different conditions, cell types and other variables. However, SCP faces major challenges related both to the nature of this data and to the difficulty of its acquisition. One of the most critical issues is data sparsity or missingness, aggravated by the limitations of the current available imputation techniques, often generating considerable biases. Traditional omics analysis usually employes data modelling to identify differentially expressed proteins across conditions, followed by a Gene Set Analysis to give the results a biological interpretation. The main objective of this thesis is to explore and discuss the use of clustering, biclustering and community detection methods to identify modules of co-expressed proteins in cells of the same type (from two example datasets), as an alternative to univariate protein modelling. The main algorithms used are k-means, QUBIC and Leiden community detection. Furthermore, a statistical test based on a non-parametric null distribution of cluster silhouettes is implemented in order to identify the most relevant clusters. The relevant clusters reported from the different methods are then enriched via Fisher over-representation test and compared to evaluate their biological significance.
Co-expressed protein modules in single-cell proteomics: an exploratory method comparison
MENNA, EMMA
2024/2025
Abstract
Single-Cell Proteomics (SCP) is a scientific field focused on studying protein expressions in individual cells. Over the past decade, SCP has gained importance over genome and transcriptome studies due to the functional role of proteins, whose expression levels could provide interesting insights when comparing cells under different conditions, cell types and other variables. However, SCP faces major challenges related both to the nature of this data and to the difficulty of its acquisition. One of the most critical issues is data sparsity or missingness, aggravated by the limitations of the current available imputation techniques, often generating considerable biases. Traditional omics analysis usually employes data modelling to identify differentially expressed proteins across conditions, followed by a Gene Set Analysis to give the results a biological interpretation. The main objective of this thesis is to explore and discuss the use of clustering, biclustering and community detection methods to identify modules of co-expressed proteins in cells of the same type (from two example datasets), as an alternative to univariate protein modelling. The main algorithms used are k-means, QUBIC and Leiden community detection. Furthermore, a statistical test based on a non-parametric null distribution of cluster silhouettes is implemented in order to identify the most relevant clusters. The relevant clusters reported from the different methods are then enriched via Fisher over-representation test and compared to evaluate their biological significance.File | Dimensione | Formato | |
---|---|---|---|
Menna_Emma.pdf
accesso aperto
Dimensione
3.82 MB
Formato
Adobe PDF
|
3.82 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/84087