This thesis examines the replicability crisis in empirical sciences, highlighting how analytical flexibility, p‑hacking, and unregistered discretionary choices undermine the reliability of scientific findings. After discussing the theoretical roots of the problem and the evidence emerging from large‑scale replication projects, the work introduces Multiverse Analysis as a tool for transparency, while emphasizing its lack of a formal inferential framework. In this context, PIMA (Post‑selection Inference in Multiverse Analysis) is presented as a model based on the Sign Flipping Score Test that enables valid inference after exploring multiple analytical specifications, controlling the Family‑Wise Error Rate even in the presence of highly correlated models. The empirical application uses a reaction‑time dataset to assess the effect of lexical frequency across 18 outcome transformations. The results show that, although some specifications produce local signals of significance, none survive the maxT correction: the HF effect is therefore weak, unstable, and not robust to analytical choices. The comparison with alternative approaches—preregistration, Registered Reports, Bayesian methods, and multiverse meta‑analysis—situates PIMA within a broader ecosystem of solutions to the replicability crisis. The thesis concludes that PIMA represents a solid methodological contribution for improving the credibility of empirical research, offering rigorous and transparent inference in the presence of multiple analytical specifications.
L’elaborato analizza la crisi della replicabilità nelle scienze empiriche, evidenziando come flessibilità analitica, p‑hacking e scelte discrezionali non preregistrate compromettano l’affidabilità dei risultati scientifici. Dopo aver discusso le radici teoriche del problema e le evidenze provenienti dai grandi progetti di replica, la tesi introduce la Multiverse Analysis come strumento di trasparenza, sottolineandone però l’assenza di un quadro inferenziale formale. In questo contesto viene presentato PIMA (Post‑selection Inference in Multiverse Analysis), un modello basato sul Sign Flipping Score Test che consente di effettuare inferenza corretta dopo aver esplorato molteplici specificazioni analitiche, controllando il Family‑Wise Error Rate anche in presenza di modelli altamente correlati. L’applicazione empirica utilizza un dataset di tempi di reazione per valutare l’effetto della frequenza lessicale attraverso 18 trasformazioni dell’outcome. I risultati mostrano che, sebbene alcune specificazioni producano segnali locali di significatività, nessun modello supera la correzione maxT: l’effetto HF risulta quindi debole, instabile e non robusto rispetto alle scelte analitiche. Il confronto con approcci alternativi — preregistrazione, Registered Reports, metodi bayesiani e meta‑analisi multiversale — colloca PIMA all’interno di un ecosistema più ampio di soluzioni alla crisi della replicabilità. La tesi conclude che PIMA rappresenta un contributo metodologico solido per migliorare la credibilità della ricerca empirica, offrendo un’inferenza rigorosa e trasparente in presenza di molteplici specificazioni.
Post Selection Inference in the Multiverse Analisys Modello PIMA come soluzione alla crisi di replicabilità
VERA SANCHEZ, FABRIZIO GABRIEL
2025/2026
Abstract
This thesis examines the replicability crisis in empirical sciences, highlighting how analytical flexibility, p‑hacking, and unregistered discretionary choices undermine the reliability of scientific findings. After discussing the theoretical roots of the problem and the evidence emerging from large‑scale replication projects, the work introduces Multiverse Analysis as a tool for transparency, while emphasizing its lack of a formal inferential framework. In this context, PIMA (Post‑selection Inference in Multiverse Analysis) is presented as a model based on the Sign Flipping Score Test that enables valid inference after exploring multiple analytical specifications, controlling the Family‑Wise Error Rate even in the presence of highly correlated models. The empirical application uses a reaction‑time dataset to assess the effect of lexical frequency across 18 outcome transformations. The results show that, although some specifications produce local signals of significance, none survive the maxT correction: the HF effect is therefore weak, unstable, and not robust to analytical choices. The comparison with alternative approaches—preregistration, Registered Reports, Bayesian methods, and multiverse meta‑analysis—situates PIMA within a broader ecosystem of solutions to the replicability crisis. The thesis concludes that PIMA represents a solid methodological contribution for improving the credibility of empirical research, offering rigorous and transparent inference in the presence of multiple analytical specifications.| File | Dimensione | Formato | |
|---|---|---|---|
|
VeraSanchez_FabrizioGabriel.pdf
Accesso riservato
Dimensione
391.93 kB
Formato
Adobe PDF
|
391.93 kB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/106087