The rapid growth of high-dimensional data has made reliable inference, not merely prediction, a pressing challenge in linear regression when the number of covariates far exceeds the sample size. While sparsity-inducing estimators such as the Lasso excel at variable selection, post-selection p-values and multiple-testing guarantees remain elusive. This thesis advances the current state of the art by extending the permutation-based framework of De Santis et al. (2022) to Gaussian linear models, delivering valid per-variable p-values . For every coefficient, the proposed procedure treats the remaining predictors as potential confounders. To mitigate the limitations of the screening property, which requires that the preliminary selection step includes all relevant variables, a condition often violated under high collinearity, we explore two complementary strategies. The first performs a principal component decomposition of the confounder set followed by sparse estimation in the reduced space. The second applies a forward stepwise selection directly within the confounder set. In both cases, a standardized, sign-flipped score statistic is then computed for the target variable conditional on the selected components. Embedding these statistics in a Westfall–Young maxT permutation scheme automatically adjusts for dependence across tests, yielding simultaneous confidence statements that remain valid after model exploration. The method is fully non-parametric, distribution-free, and naturally extensible to generalized linear models. An extensive simulation study spanning Toeplitz and other covariance structures evaluates type I error, power and family-wise error rate. Across a wide range of sample sizes, signal-to-noise ratios, sparsity levels, and correlation strengths, both proposed methods achieve reliable control of type I error at the marginal level as well as strong control of the family-wise error rate (FWER). Notably, the procedure based on forward stepwise selection also attains power comparable to that of state-of-the-art approaches such as ridge-projection inference and the debiased Lasso, particularly in settings with strong predictor correlation. The primary trade-off is computational cost, mitigated through parallelized resampling. Overall, the thesis provides a practical and theoretically grounded toolkit for rigorous inference in high-dimensional regression and paves the way for analogous advances in broader classes of models.

The rapid growth of high-dimensional data has made reliable inference, not merely prediction, a pressing challenge in linear regression when the number of covariates far exceeds the sample size. While sparsity-inducing estimators such as the Lasso excel at variable selection, post-selection p-values and multiple-testing guarantees remain elusive. This thesis advances the current state of the art by extending the permutation-based framework of De Santis et al. (2022) to Gaussian linear models, delivering valid per-variable p-values . For every coefficient, the proposed procedure treats the remaining predictors as potential confounders. To mitigate the limitations of the screening property, which requires that the preliminary selection step includes all relevant variables, a condition often violated under high collinearity, we explore two complementary strategies. The first performs a principal component decomposition of the confounder set followed by sparse estimation in the reduced space. The second applies a forward stepwise selection directly within the confounder set. In both cases, a standardized, sign-flipped score statistic is then computed for the target variable conditional on the selected components. Embedding these statistics in a Westfall–Young maxT permutation scheme automatically adjusts for dependence across tests, yielding simultaneous confidence statements that remain valid after model exploration. The method is fully non-parametric, distribution-free, and naturally extensible to generalized linear models. An extensive simulation study spanning Toeplitz and other covariance structures evaluates type I error, power and family-wise error rate. Across a wide range of sample sizes, signal-to-noise ratios, sparsity levels, and correlation strengths, both proposed methods achieve reliable control of type I error at the marginal level as well as strong control of the family-wise error rate (FWER). Notably, the procedure based on forward stepwise selection also attains power comparable to that of state-of-the-art approaches such as ridge-projection inference and the debiased Lasso, particularly in settings with strong predictor correlation. The primary trade-off is computational cost, mitigated through parallelized resampling. Overall, the thesis provides a practical and theoretically grounded toolkit for rigorous inference in high-dimensional regression and paves the way for analogous advances in broader classes of models.

Permutation-Based Inference for High-Dimensional Linear Models

DELLA PENNA, PAOLO
2024/2025

Abstract

The rapid growth of high-dimensional data has made reliable inference, not merely prediction, a pressing challenge in linear regression when the number of covariates far exceeds the sample size. While sparsity-inducing estimators such as the Lasso excel at variable selection, post-selection p-values and multiple-testing guarantees remain elusive. This thesis advances the current state of the art by extending the permutation-based framework of De Santis et al. (2022) to Gaussian linear models, delivering valid per-variable p-values . For every coefficient, the proposed procedure treats the remaining predictors as potential confounders. To mitigate the limitations of the screening property, which requires that the preliminary selection step includes all relevant variables, a condition often violated under high collinearity, we explore two complementary strategies. The first performs a principal component decomposition of the confounder set followed by sparse estimation in the reduced space. The second applies a forward stepwise selection directly within the confounder set. In both cases, a standardized, sign-flipped score statistic is then computed for the target variable conditional on the selected components. Embedding these statistics in a Westfall–Young maxT permutation scheme automatically adjusts for dependence across tests, yielding simultaneous confidence statements that remain valid after model exploration. The method is fully non-parametric, distribution-free, and naturally extensible to generalized linear models. An extensive simulation study spanning Toeplitz and other covariance structures evaluates type I error, power and family-wise error rate. Across a wide range of sample sizes, signal-to-noise ratios, sparsity levels, and correlation strengths, both proposed methods achieve reliable control of type I error at the marginal level as well as strong control of the family-wise error rate (FWER). Notably, the procedure based on forward stepwise selection also attains power comparable to that of state-of-the-art approaches such as ridge-projection inference and the debiased Lasso, particularly in settings with strong predictor correlation. The primary trade-off is computational cost, mitigated through parallelized resampling. Overall, the thesis provides a practical and theoretically grounded toolkit for rigorous inference in high-dimensional regression and paves the way for analogous advances in broader classes of models.
2024
Permutation-Based Inference for High-Dimensional Linear Models
The rapid growth of high-dimensional data has made reliable inference, not merely prediction, a pressing challenge in linear regression when the number of covariates far exceeds the sample size. While sparsity-inducing estimators such as the Lasso excel at variable selection, post-selection p-values and multiple-testing guarantees remain elusive. This thesis advances the current state of the art by extending the permutation-based framework of De Santis et al. (2022) to Gaussian linear models, delivering valid per-variable p-values . For every coefficient, the proposed procedure treats the remaining predictors as potential confounders. To mitigate the limitations of the screening property, which requires that the preliminary selection step includes all relevant variables, a condition often violated under high collinearity, we explore two complementary strategies. The first performs a principal component decomposition of the confounder set followed by sparse estimation in the reduced space. The second applies a forward stepwise selection directly within the confounder set. In both cases, a standardized, sign-flipped score statistic is then computed for the target variable conditional on the selected components. Embedding these statistics in a Westfall–Young maxT permutation scheme automatically adjusts for dependence across tests, yielding simultaneous confidence statements that remain valid after model exploration. The method is fully non-parametric, distribution-free, and naturally extensible to generalized linear models. An extensive simulation study spanning Toeplitz and other covariance structures evaluates type I error, power and family-wise error rate. Across a wide range of sample sizes, signal-to-noise ratios, sparsity levels, and correlation strengths, both proposed methods achieve reliable control of type I error at the marginal level as well as strong control of the family-wise error rate (FWER). Notably, the procedure based on forward stepwise selection also attains power comparable to that of state-of-the-art approaches such as ridge-projection inference and the debiased Lasso, particularly in settings with strong predictor correlation. The primary trade-off is computational cost, mitigated through parallelized resampling. Overall, the thesis provides a practical and theoretically grounded toolkit for rigorous inference in high-dimensional regression and paves the way for analogous advances in broader classes of models.
Inference
Permutation
Flipscore
High-dimensional
File in questo prodotto:
File Dimensione Formato  
DellaPenna_Paolo.pdf

accesso aperto

Dimensione 1.29 MB
Formato Adobe PDF
1.29 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/93034