Structural Bioinformatics is a branch of science that involves the analysis of three-dimensional structures of molecules using computer science techniques. Initially, the primary focus was on proteins with a fixed three-dimensional structure. However, researchers in the last 20 years have shifted their attention to Intrinsically Disordered Proteins (IDPs), which are proteins containing disordered regions that exhibit highly heterogeneous conformations. The work in this thesis is centered around recognizing IDPs from sequences through the extraction of features that indicate protein disorder. This work presents a software tool, AlphaFold-disorder (SASA), developed by implementing PSEA and SASA algorithms. Subsequently, the quality of results produced by the new software tool was compared with state-of-the-art software. The development process involved three major procedures: the implementation of the PSEA procedure for predicting secondary structures based on three-dimensional coordinates of amino acids; the implementation of the SASA procedure for computing RSA (Relative Solvent Accessibility) of amino acids using the SASA library; and the implementation of the FoldComp procedure for managing .fcz files, which are compressed protein files. To assess the quality of the results, the dataset was initially plotted to gain insights into the distribution of features. Machine learning models were then implemented. Finally, ROC and Precision-Recall curves between AlphaFold-disorder, AlphaFold-disorder (SASA), and the best machine learning model were compared. The comparison revealed that AlphaFold-disorder (SASA) predictions are on par with AlphaFold-disorder ones, while the machine learning model requires more training data to surpass their predictions.
Structural Bioinformatics is a branch of science that involves the analysis of three-dimensional structures of molecules using computer science techniques. Initially, the primary focus was on proteins with a fixed three-dimensional structure. However, researchers in the last 20 years have shifted their attention to Intrinsically Disordered Proteins (IDPs), which are proteins containing disordered regions that exhibit highly heterogeneous conformations. The work in this thesis is centered around recognizing IDPs from sequences through the extraction of features that indicate protein disorder. This work presents a software tool, AlphaFold-disorder (SASA), developed by implementing PSEA and SASA algorithms. Subsequently, the quality of results produced by the new software tool was compared with state-of-the-art software. The development process involved three major procedures: the implementation of the PSEA procedure for predicting secondary structures based on three-dimensional coordinates of amino acids; the implementation of the SASA procedure for computing RSA (Relative Solvent Accessibility) of amino acids using the SASA library; and the implementation of the FoldComp procedure for managing .fcz files, which are compressed protein files. To assess the quality of the results, the dataset was initially plotted to gain insights into the distribution of features. Machine learning models were then implemented. Finally, ROC and Precision-Recall curves between AlphaFold-disorder, AlphaFold-disorder (SASA), and the best machine learning model were compared. The comparison revealed that AlphaFold-disorder (SASA) predictions are on par with AlphaFold-disorder ones, while the machine learning model requires more training data to surpass their predictions.
Protein Intrinsic Disorder Detection Based on Structural Features
CRIVELLARI, ALBERTO
2022/2023
Abstract
Structural Bioinformatics is a branch of science that involves the analysis of three-dimensional structures of molecules using computer science techniques. Initially, the primary focus was on proteins with a fixed three-dimensional structure. However, researchers in the last 20 years have shifted their attention to Intrinsically Disordered Proteins (IDPs), which are proteins containing disordered regions that exhibit highly heterogeneous conformations. The work in this thesis is centered around recognizing IDPs from sequences through the extraction of features that indicate protein disorder. This work presents a software tool, AlphaFold-disorder (SASA), developed by implementing PSEA and SASA algorithms. Subsequently, the quality of results produced by the new software tool was compared with state-of-the-art software. The development process involved three major procedures: the implementation of the PSEA procedure for predicting secondary structures based on three-dimensional coordinates of amino acids; the implementation of the SASA procedure for computing RSA (Relative Solvent Accessibility) of amino acids using the SASA library; and the implementation of the FoldComp procedure for managing .fcz files, which are compressed protein files. To assess the quality of the results, the dataset was initially plotted to gain insights into the distribution of features. Machine learning models were then implemented. Finally, ROC and Precision-Recall curves between AlphaFold-disorder, AlphaFold-disorder (SASA), and the best machine learning model were compared. The comparison revealed that AlphaFold-disorder (SASA) predictions are on par with AlphaFold-disorder ones, while the machine learning model requires more training data to surpass their predictions.File | Dimensione | Formato | |
---|---|---|---|
Crivellari_Alberto.pdf
accesso aperto
Dimensione
3.8 MB
Formato
Adobe PDF
|
3.8 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/58724