Thanks to technological developments in recent decades, Big Data has revolutionized the way economic analyses are conducted. This thesis explores the main characteristics of Big Data and their importance in the economic field. Subsequently, it highlights the issues related to the use of multiple regression, proposing various solutions such as Subset Selection, Shrinkage methods and machine learning techniques. The latter includes several regression techniques, such as ridge regression, lasso regression, and elastic net. Using R, an open-source environment for statistical analysis, it is possibile to analyze the ability of the models described earlier to handle large volumes of data and to identify the statistically most relevant factors for predicting GDP growth rate.
Grazie agli sviluppi della tecnologia degli ultimi decenni, i Big Data hanno rivoluzionato il modo in cui le analisi economiche vengono condotte. Questa tesi esplora le caratteristiche principali dei Big Data e la loro importanza in ambito economico. Successivamente, vengono evidenziate le problematiche legate all’utilizzo della regressione multipla, proponendo diverse soluzioni come la Subset Selection, i metodi di Shrinkage e di machine learning. Quest’ultimo prevede diverse tecniche di regressione come la regressione ridge, la regressione lasso ed elastic net. Utilizzando R, un ambiente di sviluppo open-source per l’analisi statistica, viene analizzata la capacita dei modelli descritti precedemente di gestire grandi volumi di dati e di identificare i ` fattori statisticamente piu rilevanti per la previsione del tasso di crescita del PIL.
Analisi di regressione per dataset ad alta dimensionalità
CORTESE, SOFIA
2023/2024
Abstract
Thanks to technological developments in recent decades, Big Data has revolutionized the way economic analyses are conducted. This thesis explores the main characteristics of Big Data and their importance in the economic field. Subsequently, it highlights the issues related to the use of multiple regression, proposing various solutions such as Subset Selection, Shrinkage methods and machine learning techniques. The latter includes several regression techniques, such as ridge regression, lasso regression, and elastic net. Using R, an open-source environment for statistical analysis, it is possibile to analyze the ability of the models described earlier to handle large volumes of data and to identify the statistically most relevant factors for predicting GDP growth rate.| File | Dimensione | Formato | |
|---|---|---|---|
|
Cortese Sofia.pdf
accesso aperto
Dimensione
3.87 MB
Formato
Adobe PDF
|
3.87 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/72581