Permutation-Based Inference for High-Dimensional Linear Models

The rapid growth of high-dimensional data has made reliable inference, not merely prediction, a pressing challenge in linear regression when the number of covariates far exceeds the sample size. While sparsity-inducing estimators such as the Lasso excel at variable selection, post-selection p-values and multiple-testing guarantees remain elusive. This thesis advances the current state of the art by extending the permutation-based framework of De Santis et al. (2022) to Gaussian linear models, delivering valid per-variable p-values . For every coefficient, the proposed procedure treats the remaining predictors as potential confounders. To mitigate the limitations of the screening property, which requires that the preliminary selection step includes all relevant variables, a condition often violated under high collinearity, we explore two complementary strategies. The first performs a principal component decomposition of the confounder set followed by sparse estimation in the reduced space. The second applies a forward stepwise selection directly within the confounder set. In both cases, a standardized, sign-flipped score statistic is then computed for the target variable conditional on the selected components. Embedding these statistics in a Westfall–Young maxT permutation scheme automatically adjusts for dependence across tests, yielding simultaneous confidence statements that remain valid after model exploration. The method is fully non-parametric, distribution-free, and naturally extensible to generalized linear models. An extensive simulation study spanning Toeplitz and other covariance structures evaluates type I error, power and family-wise error rate. Across a wide range of sample sizes, signal-to-noise ratios, sparsity levels, and correlation strengths, both proposed methods achieve reliable control of type I error at the marginal level as well as strong control of the family-wise error rate (FWER). Notably, the procedure based on forward stepwise selection also attains power comparable to that of state-of-the-art approaches such as ridge-projection inference and the debiased Lasso, particularly in settings with strong predictor correlation. The primary trade-off is computational cost, mitigated through parallelized resampling. Overall, the thesis provides a practical and theoretically grounded toolkit for rigorous inference in high-dimensional regression and paves the way for analogous advances in broader classes of models.

Permutation-Based Inference for High-Dimensional Linear Models

DELLA PENNA, PAOLO

2024/2025

Abstract

The rapid growth of high-dimensional data has made reliable inference, not merely prediction, a pressing challenge in linear regression when the number of covariates far exceeds the sample size. While sparsity-inducing estimators such as the Lasso excel at variable selection, post-selection p-values and multiple-testing guarantees remain elusive. This thesis advances the current state of the art by extending the permutation-based framework of De Santis et al. (2022) to Gaussian linear models, delivering valid per-variable p-values . For every coefficient, the proposed procedure treats the remaining predictors as potential confounders. To mitigate the limitations of the screening property, which requires that the preliminary selection step includes all relevant variables, a condition often violated under high collinearity, we explore two complementary strategies. The first performs a principal component decomposition of the confounder set followed by sparse estimation in the reduced space. The second applies a forward stepwise selection directly within the confounder set. In both cases, a standardized, sign-flipped score statistic is then computed for the target variable conditional on the selected components. Embedding these statistics in a Westfall–Young maxT permutation scheme automatically adjusts for dependence across tests, yielding simultaneous confidence statements that remain valid after model exploration. The method is fully non-parametric, distribution-free, and naturally extensible to generalized linear models. An extensive simulation study spanning Toeplitz and other covariance structures evaluates type I error, power and family-wise error rate. Across a wide range of sample sizes, signal-to-noise ratios, sparsity levels, and correlation strengths, both proposed methods achieve reliable control of type I error at the marginal level as well as strong control of the family-wise error rate (FWER). Notably, the procedure based on forward stepwise selection also attains power comparable to that of state-of-the-art approaches such as ridge-projection inference and the debiased Lasso, particularly in settings with strong predictor correlation. The primary trade-off is computational cost, mitigated through parallelized resampling. Overall, the thesis provides a practical and theoretically grounded toolkit for rigorous inference in high-dimensional regression and paves the way for analogous advances in broader classes of models.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Scienze Statistiche
			
	Corso di studio
	
				SCIENZE STATISTICHE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Permutation-Based Inference for High-Dimensional Linear Models
			
	Abstract in italiano
	
				The rapid growth of high-dimensional data has made reliable inference, not merely prediction, a pressing challenge in linear regression when the number of covariates far exceeds the sample size. While sparsity-inducing estimators such as the Lasso excel at variable selection, post-selection p-values and multiple-testing guarantees remain elusive. This thesis advances the current state of the art by extending the permutation-based framework of De Santis et al. (2022) to Gaussian linear models, delivering valid per-variable p-values .
For every coefficient, the proposed procedure treats the remaining predictors as potential confounders. To mitigate the limitations of the screening property, which requires that the preliminary selection step includes all relevant variables, a condition often violated under high collinearity, we explore two complementary strategies. The first performs a principal component decomposition of the confounder set followed by sparse estimation in the reduced space. The second applies a forward stepwise selection directly within the confounder set. In both cases, a standardized, sign-flipped score statistic is then computed for the target variable conditional on the selected components. Embedding these statistics in a Westfall–Young maxT permutation scheme automatically adjusts for dependence across tests, yielding simultaneous confidence statements that remain valid after model exploration. The method is fully non-parametric, distribution-free, and naturally extensible to generalized linear models.
An extensive simulation study spanning Toeplitz and other covariance structures evaluates type I error, power and family-wise error rate. Across a wide range of sample sizes, signal-to-noise ratios, sparsity levels, and correlation strengths, both proposed methods achieve reliable control of type I error at the marginal level as well as strong control of the family-wise error rate (FWER). Notably, the procedure based on forward stepwise selection also attains power comparable to that of state-of-the-art approaches such as ridge-projection inference and the debiased Lasso, particularly in settings with strong predictor correlation. The primary trade-off is computational cost, mitigated through parallelized resampling. Overall, the thesis provides a practical and theoretically grounded toolkit for rigorous inference in high-dimensional regression and paves the way for analogous advances in broader classes of models.
			
	Parola chiave
	
				Inference
Permutation
Flipscore
High-dimensional
			
	Relatore
	
				FINOS, LIVIO
			
	Correlatore
	
				CORBETTA, DANIELA
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
DellaPenna_Paolo.pdf accesso aperto Dimensione 1.29 MB Formato Adobe PDF Visualizza/Apri	1.29 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/93034