The rise of metagenomics and the technological improvements in the fields of bioinformatics and computational biology led to an exponential increase in the amount of biological data available to be studied. However, the rate at which biological data are studied is much slower than the rate at which they are stored. This issue pushed the development of programs capable of extracting significant information from newly sourced data without the need of human intervention. More specifically, some of these programs have been developed to infer structural information from protein sequences. Since the structure of a protein is strictly bound to its function, it is easy to understand the importance of such task. Among the structural information which can be inferred looking at a protein sequence, there are contact maps. Contact maps define whether two residues are functionally linked within the same protein chain or two different ones. Despite much work has been carried out for intra-chain contact maps prediction using sequence information, less can be found about inter-chain contact maps. Moreover, methods are usually presented and tested on benchmark dataset generated for such purpose. In this, a whole pipeline for both intra-chain and inter-chain contact predictions is presented. Instead of using a generic benchmark set of protein sequences as input, the pipeline starts from predictions of linear interacting peptides at residues level. Linear interacting peptides are regions in a protein sequence which are thought to not have a fixed folding, but to adapt their structure to the functional needs of the protein itself. Needles to say, fewer studies have been conducted about this specific issue in literature. Finally, an analysis of the results is carried out. The analysis focuses on the evaluation of methods implied for contact predictions over the given dataset. Particular attention is paid to the comparison of the performances on inter-chain alignments with respect to the ones achieved on intra-chain alignments. Furthermore, the effect of linear interacting peptides is taken into account.

Contacts prediction of linear peptides from genomic data

Clementel, Damiano
2021/2022

Abstract

The rise of metagenomics and the technological improvements in the fields of bioinformatics and computational biology led to an exponential increase in the amount of biological data available to be studied. However, the rate at which biological data are studied is much slower than the rate at which they are stored. This issue pushed the development of programs capable of extracting significant information from newly sourced data without the need of human intervention. More specifically, some of these programs have been developed to infer structural information from protein sequences. Since the structure of a protein is strictly bound to its function, it is easy to understand the importance of such task. Among the structural information which can be inferred looking at a protein sequence, there are contact maps. Contact maps define whether two residues are functionally linked within the same protein chain or two different ones. Despite much work has been carried out for intra-chain contact maps prediction using sequence information, less can be found about inter-chain contact maps. Moreover, methods are usually presented and tested on benchmark dataset generated for such purpose. In this, a whole pipeline for both intra-chain and inter-chain contact predictions is presented. Instead of using a generic benchmark set of protein sequences as input, the pipeline starts from predictions of linear interacting peptides at residues level. Linear interacting peptides are regions in a protein sequence which are thought to not have a fixed folding, but to adapt their structure to the functional needs of the protein itself. Needles to say, fewer studies have been conducted about this specific issue in literature. Finally, an analysis of the results is carried out. The analysis focuses on the evaluation of methods implied for contact predictions over the given dataset. Particular attention is paid to the comparison of the performances on inter-chain alignments with respect to the ones achieved on intra-chain alignments. Furthermore, the effect of linear interacting peptides is taken into account.
2021-04-21
101
LIP, contact prediction, genomic data
File in questo prodotto:
File Dimensione Formato  
tesi_ClementelDef.pdf

accesso aperto

Dimensione 2.25 MB
Formato Adobe PDF
2.25 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/23452