Genetic constraint is the depletion of disruptive variation caused by purifying natural selection. Coding regions in the genome with these characteristics can be implied to have an essential function or be involved in disease pathology. This project explores the intersection of genetically constrained regions and Intrinsically Disordered Regions (IDRs) in proteins. IDRs, which lack a fixed 3D structure, are characterized by low sequence conservation but retain functional flexibility. Recent studies have shown that while these regions are typically more tolerant of mutations, some show strong constraint, implying critical biological function. For this project, we aimed to identify genetically constrained regions within IDRs in proteins related to Cancer and to Neurodevelopmental disorders (NDDs). From a curated set of 427 proteins, we integrated genomic variant data (gnomAD, v.4.1), structural status information (Disprot, MobiDB) and clinically significant variants (Clinvar). Moreover, we calculated the Missense Tolerance Ratio (MTR) of each amino acid in their sequences as a metric for regional tolerance of missense variation, and compared them with their structural status. The MTR distributions revealed that disordered and missing residue regions generally exhibited significant higher tolerance (MTR ≈ 0.91) compared to ordered domains (MTR ≈ 0.86). The prevalence of pathogenic ClinVar variants was consistently higher in ordered residues (0.66%) compared to disordered regions (0.14%). On the other hand, benign variants were more frequently found in disordered states (2.8%) than in ordered regions (2.4%), affirming the functional flexibility and mutational tolerance in IDRs. On the structural compositions and incidence of clinical variants, we observed a higher proportion of disordered residues in cancer proteins, while NDD proteins were comparatively enriched for ordered domains. Importantly, cancer disordered residues where enriched in benign variants, while NDD ordered domains had an increased incidence of pathogenic mutations. This study helps to refine our understanding of the biological relevance of IDRs and highlights the importance of identifying critical regulatory elements within disordered regions of the genome.

Genetic constraint is the depletion of disruptive variation caused by purifying natural selection. Coding regions in the genome with these characteristics can be implied to have an essential function or be involved in disease pathology. This project explores the intersection of genetically constrained regions and Intrinsically Disordered Regions (IDRs) in proteins. IDRs, which lack a fixed 3D structure, are characterized by low sequence conservation but retain functional flexibility. Recent studies have shown that while these regions are typically more tolerant of mutations, some show strong constraint, implying critical biological function. For this project, we aimed to identify genetically constrained regions within IDRs in proteins related to Cancer and to Neurodevelopmental disorders (NDDs). From a curated set of 427 proteins, we integrated genomic variant data (gnomAD, v.4.1), structural status information (Disprot, MobiDB) and clinically significant variants (Clinvar). Moreover, we calculated the Missense Tolerance Ratio (MTR) of each amino acid in their sequences as a metric for regional tolerance of missense variation, and compared them with their structural status. The MTR distributions revealed that disordered and missing residue regions generally exhibited significant higher tolerance (MTR ≈ 0.91) compared to ordered domains (MTR ≈ 0.86). The prevalence of pathogenic ClinVar variants was consistently higher in ordered residues (0.66%) compared to disordered regions (0.14%). On the other hand, benign variants were more frequently found in disordered states (2.8%) than in ordered regions (2.4%), affirming the functional flexibility and mutational tolerance in IDRs. On the structural compositions and incidence of clinical variants, we observed a higher proportion of disordered residues in cancer proteins, while NDD proteins were comparatively enriched for ordered domains. Importantly, cancer disordered residues where enriched in benign variants, while NDD ordered domains had an increased incidence of pathogenic mutations. This study helps to refine our understanding of the biological relevance of IDRs and highlights the importance of identifying critical regulatory elements within disordered regions of the genome.

Analysis of Missense Tolerance (MTR) and Constrained Coding Regions (CCR) in pathological conditions-related Intrinsically Disordered Proteins (IDPs).

ROBLES AZUGARAY, NAZARETH DE JESÚS
2024/2025

Abstract

Genetic constraint is the depletion of disruptive variation caused by purifying natural selection. Coding regions in the genome with these characteristics can be implied to have an essential function or be involved in disease pathology. This project explores the intersection of genetically constrained regions and Intrinsically Disordered Regions (IDRs) in proteins. IDRs, which lack a fixed 3D structure, are characterized by low sequence conservation but retain functional flexibility. Recent studies have shown that while these regions are typically more tolerant of mutations, some show strong constraint, implying critical biological function. For this project, we aimed to identify genetically constrained regions within IDRs in proteins related to Cancer and to Neurodevelopmental disorders (NDDs). From a curated set of 427 proteins, we integrated genomic variant data (gnomAD, v.4.1), structural status information (Disprot, MobiDB) and clinically significant variants (Clinvar). Moreover, we calculated the Missense Tolerance Ratio (MTR) of each amino acid in their sequences as a metric for regional tolerance of missense variation, and compared them with their structural status. The MTR distributions revealed that disordered and missing residue regions generally exhibited significant higher tolerance (MTR ≈ 0.91) compared to ordered domains (MTR ≈ 0.86). The prevalence of pathogenic ClinVar variants was consistently higher in ordered residues (0.66%) compared to disordered regions (0.14%). On the other hand, benign variants were more frequently found in disordered states (2.8%) than in ordered regions (2.4%), affirming the functional flexibility and mutational tolerance in IDRs. On the structural compositions and incidence of clinical variants, we observed a higher proportion of disordered residues in cancer proteins, while NDD proteins were comparatively enriched for ordered domains. Importantly, cancer disordered residues where enriched in benign variants, while NDD ordered domains had an increased incidence of pathogenic mutations. This study helps to refine our understanding of the biological relevance of IDRs and highlights the importance of identifying critical regulatory elements within disordered regions of the genome.
2024
Analysis of Missense Tolerance Ratio (MTR) and Constrained Coding Regions (CCR) in pathological conditions-related Intrinsically Disordered Proteins (IDPs).
Genetic constraint is the depletion of disruptive variation caused by purifying natural selection. Coding regions in the genome with these characteristics can be implied to have an essential function or be involved in disease pathology. This project explores the intersection of genetically constrained regions and Intrinsically Disordered Regions (IDRs) in proteins. IDRs, which lack a fixed 3D structure, are characterized by low sequence conservation but retain functional flexibility. Recent studies have shown that while these regions are typically more tolerant of mutations, some show strong constraint, implying critical biological function. For this project, we aimed to identify genetically constrained regions within IDRs in proteins related to Cancer and to Neurodevelopmental disorders (NDDs). From a curated set of 427 proteins, we integrated genomic variant data (gnomAD, v.4.1), structural status information (Disprot, MobiDB) and clinically significant variants (Clinvar). Moreover, we calculated the Missense Tolerance Ratio (MTR) of each amino acid in their sequences as a metric for regional tolerance of missense variation, and compared them with their structural status. The MTR distributions revealed that disordered and missing residue regions generally exhibited significant higher tolerance (MTR ≈ 0.91) compared to ordered domains (MTR ≈ 0.86). The prevalence of pathogenic ClinVar variants was consistently higher in ordered residues (0.66%) compared to disordered regions (0.14%). On the other hand, benign variants were more frequently found in disordered states (2.8%) than in ordered regions (2.4%), affirming the functional flexibility and mutational tolerance in IDRs. On the structural compositions and incidence of clinical variants, we observed a higher proportion of disordered residues in cancer proteins, while NDD proteins were comparatively enriched for ordered domains. Importantly, cancer disordered residues where enriched in benign variants, while NDD ordered domains had an increased incidence of pathogenic mutations. This study helps to refine our understanding of the biological relevance of IDRs and highlights the importance of identifying critical regulatory elements within disordered regions of the genome.
Genetic constraint
Variants
Cancer
NDDs
Mapping
File in questo prodotto:
File Dimensione Formato  
NazarethRobles_Thesis.pdf

Accesso riservato

Dimensione 1.66 MB
Formato Adobe PDF
1.66 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/94294