Genetic constraint is the depletion of disruptive variation caused by purifying natural selection. Coding regions in the genome with these characteristics can be implied to have an essential function or be involved in disease pathology. This project explores the intersection of genetically constrained regions and Intrinsically Disordered Regions (IDRs) in proteins. IDRs, which lack a fixed 3D structure, are characterized by low sequence conservation but retain functional flexibility. Recent studies have shown that while these regions are typically more tolerant of mutations, some show strong constraint, implying critical biological function. For this project, we aimed to identify genetically constrained regions within IDRs in proteins related to Cancer and to Neurodevelopmental disorders (NDDs). From a curated set of 427 proteins, we integrated genomic variant data (gnomAD, v.4.1), structural status information (Disprot, MobiDB) and clinically significant variants (Clinvar). Moreover, we calculated the Missense Tolerance Ratio (MTR) of each amino acid in their sequences as a metric for regional tolerance of missense variation, and compared them with their structural status. The MTR distributions revealed that disordered and missing residue regions generally exhibited significant higher tolerance (MTR ≈ 0.91) compared to ordered domains (MTR ≈ 0.86). The prevalence of pathogenic ClinVar variants was consistently higher in ordered residues (0.66%) compared to disordered regions (0.14%). On the other hand, benign variants were more frequently found in disordered states (2.8%) than in ordered regions (2.4%), affirming the functional flexibility and mutational tolerance in IDRs. On the structural compositions and incidence of clinical variants, we observed a higher proportion of disordered residues in cancer proteins, while NDD proteins were comparatively enriched for ordered domains. Importantly, cancer disordered residues where enriched in benign variants, while NDD ordered domains had an increased incidence of pathogenic mutations. This study helps to refine our understanding of the biological relevance of IDRs and highlights the importance of identifying critical regulatory elements within disordered regions of the genome.
Genetic constraint is the depletion of disruptive variation caused by purifying natural selection. Coding regions in the genome with these characteristics can be implied to have an essential function or be involved in disease pathology. This project explores the intersection of genetically constrained regions and Intrinsically Disordered Regions (IDRs) in proteins. IDRs, which lack a fixed 3D structure, are characterized by low sequence conservation but retain functional flexibility. Recent studies have shown that while these regions are typically more tolerant of mutations, some show strong constraint, implying critical biological function. For this project, we aimed to identify genetically constrained regions within IDRs in proteins related to Cancer and to Neurodevelopmental disorders (NDDs). From a curated set of 427 proteins, we integrated genomic variant data (gnomAD, v.4.1), structural status information (Disprot, MobiDB) and clinically significant variants (Clinvar). Moreover, we calculated the Missense Tolerance Ratio (MTR) of each amino acid in their sequences as a metric for regional tolerance of missense variation, and compared them with their structural status. The MTR distributions revealed that disordered and missing residue regions generally exhibited significant higher tolerance (MTR ≈ 0.91) compared to ordered domains (MTR ≈ 0.86). The prevalence of pathogenic ClinVar variants was consistently higher in ordered residues (0.66%) compared to disordered regions (0.14%). On the other hand, benign variants were more frequently found in disordered states (2.8%) than in ordered regions (2.4%), affirming the functional flexibility and mutational tolerance in IDRs. On the structural compositions and incidence of clinical variants, we observed a higher proportion of disordered residues in cancer proteins, while NDD proteins were comparatively enriched for ordered domains. Importantly, cancer disordered residues where enriched in benign variants, while NDD ordered domains had an increased incidence of pathogenic mutations. This study helps to refine our understanding of the biological relevance of IDRs and highlights the importance of identifying critical regulatory elements within disordered regions of the genome.
Analysis of Missense Tolerance (MTR) and Constrained Coding Regions (CCR) in pathological conditions-related Intrinsically Disordered Proteins (IDPs).
ROBLES AZUGARAY, NAZARETH DE JESÚS
2024/2025
Abstract
Genetic constraint is the depletion of disruptive variation caused by purifying natural selection. Coding regions in the genome with these characteristics can be implied to have an essential function or be involved in disease pathology. This project explores the intersection of genetically constrained regions and Intrinsically Disordered Regions (IDRs) in proteins. IDRs, which lack a fixed 3D structure, are characterized by low sequence conservation but retain functional flexibility. Recent studies have shown that while these regions are typically more tolerant of mutations, some show strong constraint, implying critical biological function. For this project, we aimed to identify genetically constrained regions within IDRs in proteins related to Cancer and to Neurodevelopmental disorders (NDDs). From a curated set of 427 proteins, we integrated genomic variant data (gnomAD, v.4.1), structural status information (Disprot, MobiDB) and clinically significant variants (Clinvar). Moreover, we calculated the Missense Tolerance Ratio (MTR) of each amino acid in their sequences as a metric for regional tolerance of missense variation, and compared them with their structural status. The MTR distributions revealed that disordered and missing residue regions generally exhibited significant higher tolerance (MTR ≈ 0.91) compared to ordered domains (MTR ≈ 0.86). The prevalence of pathogenic ClinVar variants was consistently higher in ordered residues (0.66%) compared to disordered regions (0.14%). On the other hand, benign variants were more frequently found in disordered states (2.8%) than in ordered regions (2.4%), affirming the functional flexibility and mutational tolerance in IDRs. On the structural compositions and incidence of clinical variants, we observed a higher proportion of disordered residues in cancer proteins, while NDD proteins were comparatively enriched for ordered domains. Importantly, cancer disordered residues where enriched in benign variants, while NDD ordered domains had an increased incidence of pathogenic mutations. This study helps to refine our understanding of the biological relevance of IDRs and highlights the importance of identifying critical regulatory elements within disordered regions of the genome.| File | Dimensione | Formato | |
|---|---|---|---|
|
NazarethRobles_Thesis.pdf
Accesso riservato
Dimensione
1.66 MB
Formato
Adobe PDF
|
1.66 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/94294