Background: Genome-wide association studies (GWAS) have historically relied on linear and logistic regression frameworks. Nonetheless, many clinical outcomes are time-to-event in nature and require survival analysis approaches, including Cox proportional hazards models. Population stratification represents a key methodological challenge in GWAS, as differences in allele frequencies across populations can introduce spurious associations and obscure true genetic signals. Additionally, populations may differ in environmental factors and in clinical management, further confounding the link between genetic markers and disease-related outcomes. The relative efficiency of conventional GWAS analytical approaches versus Cox-based approaches aimed at identifying genetic effects on survival outcomes remains unclear, particularly when population stratification and hierarchical data structures (e.g., multi-center studies) are present. Objective: The primary objective of this study is to determine whether principal component (PC)–based adjustment remains an adequate and reliable strategy to correct for population stratification in genome-wide association studies when the outcome is time-to-event. To this end, we perform a systematic comparison of the statistical power, ranking accuracy, and effect size estimation of multiple analytical approaches that explicitly account for population structure, including Cox-based models adjusted for PCs. These methods are evaluated under two distinct scenarios: (1) time-to-event outcomes influenced by a random center variable, independent of the underlying genetic structure; and (2) time-to-event outcomes influenced by population of origin, reflecting both differential standards of care across populations and systematic differences in allele frequencies across ancestral groups. Methods: An extensive simulation framework was implemented, including 1,600 simulations across 16 experimental conditions, varying the proportion of genetic variance explained (1%, 1.5%, 3.2% and 8%) and sample size (N = 20, 500, 1000 and 2000). Phenotypes were generated using PhenotypeSimulator with a single causal SNP and were subsequently converted into survival outcomes according to an exponential distribution. Two clustering variables were defined: a random center variable (representing random assignment with no confounding) and a population variable obtained through k-means clustering applied to genetic principal components (representing true population stratification that confounds results). Five methods were compared in each scenario: GWAS linear regression (reference), Cox model with gamma frailty, Cox model with cluster-robust standard errors, Cox model with principal components as covariates, and a scenario-specific Gold Standard (Cox with the true clustering variable as fixed effect)."Performance metrics included statistical power (p < 5×10⁻⁸), the median rank of the causal SNP among 92,988 tested variants, and estimation bias with respect to the true simulated effect. Results/Conclusions: PC-based adjustment in Cox proportional hazards models behaved broadly in line with standard GWAS practice in our semi-synthetic simulations. Under population-driven confounding, Cox models adjusted for PC1–PC2 showed performance comparable to a Gold Standard Cox model including the true population labels, particularly in terms of causal-variant prioritization and effect-size agreement. Overall, these results support the adequacy of PCA-based correction for population stratification in survival GWAS and suggest that established GWAS adjustment strategies can be extended to time-to-event outcomes under similar settings, while acknowledging an expected efficiency (power) trade-off when adding adjustment covariates.

Background: Genome-wide association studies (GWAS) have historically relied on linear and logistic regression frameworks. Nonetheless, many clinical outcomes are time-to-event in nature and require survival analysis approaches, including Cox proportional hazards models. Population stratification represents a key methodological challenge in GWAS, as differences in allele frequencies across populations can introduce spurious associations and obscure true genetic signals. Additionally, populations may differ in environmental factors and in clinical management, further confounding the link between genetic markers and disease-related outcomes. The relative efficiency of conventional GWAS analytical approaches versus Cox-based approaches aimed at identifying genetic effects on survival outcomes remains unclear, particularly when population stratification and hierarchical data structures (e.g., multi-center studies) are present. Objective: The primary objective of this study is to determine whether principal component (PC)–based adjustment remains an adequate and reliable strategy to correct for population stratification in genome-wide association studies when the outcome is time-to-event. To this end, we perform a systematic comparison of the statistical power, ranking accuracy, and effect size estimation of multiple analytical approaches that explicitly account for population structure, including Cox-based models adjusted for PCs. These methods are evaluated under two distinct scenarios: (1) time-to-event outcomes influenced by a random center variable, independent of the underlying genetic structure; and (2) time-to-event outcomes influenced by population of origin, reflecting both differential standards of care across populations and systematic differences in allele frequencies across ancestral groups. Methods: An extensive simulation framework was implemented, including 1,600 simulations across 16 experimental conditions, varying the proportion of genetic variance explained (1%, 1.5%, 3.2% and 8%) and sample size (N = 20, 500, 1000 and 2000). Phenotypes were generated using PhenotypeSimulator with a single causal SNP and were subsequently converted into survival outcomes according to an exponential distribution. Two clustering variables were defined: a random center variable (representing random assignment with no confounding) and a population variable obtained through k-means clustering applied to genetic principal components (representing true population stratification that confounds results). Five methods were compared in each scenario: GWAS linear regression (reference), Cox model with gamma frailty, Cox model with cluster-robust standard errors, Cox model with principal components as covariates, and a scenario-specific Gold Standard (Cox with the true clustering variable as fixed effect)."Performance metrics included statistical power (p < 5×10⁻⁸), the median rank of the causal SNP among 92,988 tested variants, and estimation bias with respect to the true simulated effect. Results/Conclusions: PC-based adjustment in Cox proportional hazards models behaved broadly in line with standard GWAS practice in our semi-synthetic simulations. Under population-driven confounding, Cox models adjusted for PC1–PC2 showed performance comparable to a Gold Standard Cox model including the true population labels, particularly in terms of causal-variant prioritization and effect-size agreement. Overall, these results support the adequacy of PCA-based correction for population stratification in survival GWAS and suggest that established GWAS adjustment strategies can be extended to time-to-event outcomes under similar settings, while acknowledging an expected efficiency (power) trade-off when adding adjustment covariates.

Comparison of Statistical Approaches for Genome-Wide Association Studies with Time-to-Event Outcomes

SABBATINI, DANIELE
2023/2024

Abstract

Background: Genome-wide association studies (GWAS) have historically relied on linear and logistic regression frameworks. Nonetheless, many clinical outcomes are time-to-event in nature and require survival analysis approaches, including Cox proportional hazards models. Population stratification represents a key methodological challenge in GWAS, as differences in allele frequencies across populations can introduce spurious associations and obscure true genetic signals. Additionally, populations may differ in environmental factors and in clinical management, further confounding the link between genetic markers and disease-related outcomes. The relative efficiency of conventional GWAS analytical approaches versus Cox-based approaches aimed at identifying genetic effects on survival outcomes remains unclear, particularly when population stratification and hierarchical data structures (e.g., multi-center studies) are present. Objective: The primary objective of this study is to determine whether principal component (PC)–based adjustment remains an adequate and reliable strategy to correct for population stratification in genome-wide association studies when the outcome is time-to-event. To this end, we perform a systematic comparison of the statistical power, ranking accuracy, and effect size estimation of multiple analytical approaches that explicitly account for population structure, including Cox-based models adjusted for PCs. These methods are evaluated under two distinct scenarios: (1) time-to-event outcomes influenced by a random center variable, independent of the underlying genetic structure; and (2) time-to-event outcomes influenced by population of origin, reflecting both differential standards of care across populations and systematic differences in allele frequencies across ancestral groups. Methods: An extensive simulation framework was implemented, including 1,600 simulations across 16 experimental conditions, varying the proportion of genetic variance explained (1%, 1.5%, 3.2% and 8%) and sample size (N = 20, 500, 1000 and 2000). Phenotypes were generated using PhenotypeSimulator with a single causal SNP and were subsequently converted into survival outcomes according to an exponential distribution. Two clustering variables were defined: a random center variable (representing random assignment with no confounding) and a population variable obtained through k-means clustering applied to genetic principal components (representing true population stratification that confounds results). Five methods were compared in each scenario: GWAS linear regression (reference), Cox model with gamma frailty, Cox model with cluster-robust standard errors, Cox model with principal components as covariates, and a scenario-specific Gold Standard (Cox with the true clustering variable as fixed effect)."Performance metrics included statistical power (p < 5×10⁻⁸), the median rank of the causal SNP among 92,988 tested variants, and estimation bias with respect to the true simulated effect. Results/Conclusions: PC-based adjustment in Cox proportional hazards models behaved broadly in line with standard GWAS practice in our semi-synthetic simulations. Under population-driven confounding, Cox models adjusted for PC1–PC2 showed performance comparable to a Gold Standard Cox model including the true population labels, particularly in terms of causal-variant prioritization and effect-size agreement. Overall, these results support the adequacy of PCA-based correction for population stratification in survival GWAS and suggest that established GWAS adjustment strategies can be extended to time-to-event outcomes under similar settings, while acknowledging an expected efficiency (power) trade-off when adding adjustment covariates.
2023
Comparison of Statistical Approaches for Genome-Wide Association Studies with Time-to-Event Outcomes
Background: Genome-wide association studies (GWAS) have historically relied on linear and logistic regression frameworks. Nonetheless, many clinical outcomes are time-to-event in nature and require survival analysis approaches, including Cox proportional hazards models. Population stratification represents a key methodological challenge in GWAS, as differences in allele frequencies across populations can introduce spurious associations and obscure true genetic signals. Additionally, populations may differ in environmental factors and in clinical management, further confounding the link between genetic markers and disease-related outcomes. The relative efficiency of conventional GWAS analytical approaches versus Cox-based approaches aimed at identifying genetic effects on survival outcomes remains unclear, particularly when population stratification and hierarchical data structures (e.g., multi-center studies) are present. Objective: The primary objective of this study is to determine whether principal component (PC)–based adjustment remains an adequate and reliable strategy to correct for population stratification in genome-wide association studies when the outcome is time-to-event. To this end, we perform a systematic comparison of the statistical power, ranking accuracy, and effect size estimation of multiple analytical approaches that explicitly account for population structure, including Cox-based models adjusted for PCs. These methods are evaluated under two distinct scenarios: (1) time-to-event outcomes influenced by a random center variable, independent of the underlying genetic structure; and (2) time-to-event outcomes influenced by population of origin, reflecting both differential standards of care across populations and systematic differences in allele frequencies across ancestral groups. Methods: An extensive simulation framework was implemented, including 1,600 simulations across 16 experimental conditions, varying the proportion of genetic variance explained (1%, 1.5%, 3.2% and 8%) and sample size (N = 20, 500, 1000 and 2000). Phenotypes were generated using PhenotypeSimulator with a single causal SNP and were subsequently converted into survival outcomes according to an exponential distribution. Two clustering variables were defined: a random center variable (representing random assignment with no confounding) and a population variable obtained through k-means clustering applied to genetic principal components (representing true population stratification that confounds results). Five methods were compared in each scenario: GWAS linear regression (reference), Cox model with gamma frailty, Cox model with cluster-robust standard errors, Cox model with principal components as covariates, and a scenario-specific Gold Standard (Cox with the true clustering variable as fixed effect)."Performance metrics included statistical power (p < 5×10⁻⁸), the median rank of the causal SNP among 92,988 tested variants, and estimation bias with respect to the true simulated effect. Results/Conclusions: PC-based adjustment in Cox proportional hazards models behaved broadly in line with standard GWAS practice in our semi-synthetic simulations. Under population-driven confounding, Cox models adjusted for PC1–PC2 showed performance comparable to a Gold Standard Cox model including the true population labels, particularly in terms of causal-variant prioritization and effect-size agreement. Overall, these results support the adequacy of PCA-based correction for population stratification in survival GWAS and suggest that established GWAS adjustment strategies can be extended to time-to-event outcomes under similar settings, while acknowledging an expected efficiency (power) trade-off when adding adjustment covariates.
GWAS
Survival analysis
Time-to-event
Method comparison
File in questo prodotto:
File Dimensione Formato  
daniele_sabbatini_2092812.pdf

Accesso riservato

Dimensione 3.67 MB
Formato Adobe PDF
3.67 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/103253