Abstract Protein tandem repeats are crucial structural elements in various biological processes, playing essential roles in cell adhesion, protein-protein interactions, and molecular recognition. These repetitive regions have sparked considerable interest in structural biology and bioinformatics, leading to the development of specialized resources like RepeatsDB. RepeatsDB is a comprehensive, curated database of annotated tandem repeat protein structures, offering a valuable resource for researchers. In this study, we systematically analyzed protein tandem repeats in RepeatsDB, with a primary focus on Alpha-Solenoids and Beta-Propellers, to enhance the existing classification system and provide a more profound understanding of protein tandem repeats. Our investigation commenced with an initial statistical analysis to elucidate the diversity and population status of distinct repeat groups within the database, as well as their respective degree of annotation. This approach proved instrumental in addressing the challenges associated with numerous entries that had a missing annotation. We conducted a structural analysis using pairwise structural alignment and explored dimensionality reduction and visualization techniques to uncover novel structural relationships. These findings improved our understanding of protein structural comparisons and informed a refined classification system. We utilized the density-based clustering algorithm, DBSCAN, to establish structural similarity ranges for Clan members and provide computational support for defining Clan boundaries. This method proved effective in detecting outlier entries and refining existing clans, leading to the proposal of new repeat groups. Additionally, we implemented a supervised classification experiment using the K-Nearest Neighbors (KNN) algorithm, which facilitated the automatic annotation of previously unannotated entries. This study introduces an automatic annotation methodology that significantly improves the performance of RepeatsDB curators and can be extended to other bioinformatics applications. The findings contribute to a more comprehensive understanding of protein tandem repeats and offer valuable insights for future research in structural biology and bioinformatics.

Protein tandem repeats are crucial structural elements in various biological processes, playing essential roles in cell adhesion, protein-protein interactions, and molecular recognition. These repetitive regions have sparked considerable interest in structural biology and bioinformatics, leading to the development of specialized resources like RepeatsDB. RepeatsDB is a comprehensive, curated database of annotated tandem repeat protein structures, offering a valuable resource for researchers. In this study, we systematically analyzed protein tandem repeats in RepeatsDB, with a primary focus on Alpha-Solenoids and Beta-Propellers, to enhance the existing classification system and provide a more profound understanding of protein tandem repeats. Our investigation commenced with an initial statistical analysis to elucidate the diversity and population status of distinct repeat groups within the database, as well as their respective degree of annotation. This approach proved instrumental in addressing the challenges associated with numerous entries that had a missing annotation. We conducted a structural analysis using pairwise structural alignment and explored dimensionality reduction and visualization techniques to uncover novel structural relationships. These findings improved our understanding of protein structural comparisons and informed a refined classification system. We utilized the density-based clustering algorithm, DBSCAN, to establish structural similarity ranges for Clan members and provide computational support for defining Clan boundaries. This method proved effective in detecting outlier entries and refining existing clans, leading to the proposal of new repeat groups. Additionally, we implemented a supervised classification experiment using the K-Nearest Neighbors (KNN) algorithm, which facilitated the automatic annotation of previously unannotated entries. This study introduces an automatic annotation methodology that significantly improves the performance of RepeatsDB curators and can be extended to other bioinformatics applications. The findings contribute to a more comprehensive understanding of protein tandem repeats and offer valuable insights for future research in structural biology and bioinformatics.

Classification and Automatic Annotation of Tandem Repeat Proteins in RepeatsDB

MOZAFFARI, SOROUSH
2022/2023

Abstract

Abstract Protein tandem repeats are crucial structural elements in various biological processes, playing essential roles in cell adhesion, protein-protein interactions, and molecular recognition. These repetitive regions have sparked considerable interest in structural biology and bioinformatics, leading to the development of specialized resources like RepeatsDB. RepeatsDB is a comprehensive, curated database of annotated tandem repeat protein structures, offering a valuable resource for researchers. In this study, we systematically analyzed protein tandem repeats in RepeatsDB, with a primary focus on Alpha-Solenoids and Beta-Propellers, to enhance the existing classification system and provide a more profound understanding of protein tandem repeats. Our investigation commenced with an initial statistical analysis to elucidate the diversity and population status of distinct repeat groups within the database, as well as their respective degree of annotation. This approach proved instrumental in addressing the challenges associated with numerous entries that had a missing annotation. We conducted a structural analysis using pairwise structural alignment and explored dimensionality reduction and visualization techniques to uncover novel structural relationships. These findings improved our understanding of protein structural comparisons and informed a refined classification system. We utilized the density-based clustering algorithm, DBSCAN, to establish structural similarity ranges for Clan members and provide computational support for defining Clan boundaries. This method proved effective in detecting outlier entries and refining existing clans, leading to the proposal of new repeat groups. Additionally, we implemented a supervised classification experiment using the K-Nearest Neighbors (KNN) algorithm, which facilitated the automatic annotation of previously unannotated entries. This study introduces an automatic annotation methodology that significantly improves the performance of RepeatsDB curators and can be extended to other bioinformatics applications. The findings contribute to a more comprehensive understanding of protein tandem repeats and offer valuable insights for future research in structural biology and bioinformatics.
2022
Classification and Automatic Annotation of Tandem Repeat Proteins in RepeatsDB
Protein tandem repeats are crucial structural elements in various biological processes, playing essential roles in cell adhesion, protein-protein interactions, and molecular recognition. These repetitive regions have sparked considerable interest in structural biology and bioinformatics, leading to the development of specialized resources like RepeatsDB. RepeatsDB is a comprehensive, curated database of annotated tandem repeat protein structures, offering a valuable resource for researchers. In this study, we systematically analyzed protein tandem repeats in RepeatsDB, with a primary focus on Alpha-Solenoids and Beta-Propellers, to enhance the existing classification system and provide a more profound understanding of protein tandem repeats. Our investigation commenced with an initial statistical analysis to elucidate the diversity and population status of distinct repeat groups within the database, as well as their respective degree of annotation. This approach proved instrumental in addressing the challenges associated with numerous entries that had a missing annotation. We conducted a structural analysis using pairwise structural alignment and explored dimensionality reduction and visualization techniques to uncover novel structural relationships. These findings improved our understanding of protein structural comparisons and informed a refined classification system. We utilized the density-based clustering algorithm, DBSCAN, to establish structural similarity ranges for Clan members and provide computational support for defining Clan boundaries. This method proved effective in detecting outlier entries and refining existing clans, leading to the proposal of new repeat groups. Additionally, we implemented a supervised classification experiment using the K-Nearest Neighbors (KNN) algorithm, which facilitated the automatic annotation of previously unannotated entries. This study introduces an automatic annotation methodology that significantly improves the performance of RepeatsDB curators and can be extended to other bioinformatics applications. The findings contribute to a more comprehensive understanding of protein tandem repeats and offer valuable insights for future research in structural biology and bioinformatics.
Protein Repeat
Classification
Annotation
RepeatsDB
Machine Learning
File in questo prodotto:
File Dimensione Formato  
Soroush_Mozaffari_Thesis.pdf

accesso aperto

Dimensione 7.7 MB
Formato Adobe PDF
7.7 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/48063