In this work we try to offer an overview of instruments and techniques used for Big Data analysis, with a particular focus on approaches for linear and additive models estimation and predictive intervals construction. We start by displaying the paradigms of parallel and distributed computing, followed by modern cloud computing infrastructures, together with the associated software. Afterwards we consider the problem of estimating linear models in a Big Data setting, where data don't fit in memory. On top of that, we consider the problem of building prediction intervals and tackle it using the split conformal approach. In the end, we present two applications, on real and simulated data respectively, in order to show the cloud implementation of techniques and algorithms described.
In questo lavoro si vuole offrire una panoramica sugli strumenti e le tecniche utilizzate per analizzare grosse moli di dati, con particolare attenzione alle tecniche per la stima di modelli lineari e per la definizione di intervalli di previsione. Si comincia esponendo i paradigmi di calcolo parallelo e distribuito, per arrivare alle infrastrutture moderne di cloud computing, con i software associati. Successivamente si prende in considerazione il problema della stima di modelli lineare in contesti di Big Data, dove i dati non possono essere caricati in memoria. In aggiunta, si considera il problema della costruzione di intervalli di previsione esponendo l'approccio split conformal. Infine vengono presentate due applicazioni, rispettivamente su dati reali e simulati, per mostrare l’implementazione delle tecniche e degli algoritmi descritti in un ambiente cloud.
L'analisi di Big Data nel cloud: panoramica e applicazioni
PIRACCINI, ALESSIO
2021/2022
Abstract
In this work we try to offer an overview of instruments and techniques used for Big Data analysis, with a particular focus on approaches for linear and additive models estimation and predictive intervals construction. We start by displaying the paradigms of parallel and distributed computing, followed by modern cloud computing infrastructures, together with the associated software. Afterwards we consider the problem of estimating linear models in a Big Data setting, where data don't fit in memory. On top of that, we consider the problem of building prediction intervals and tackle it using the split conformal approach. In the end, we present two applications, on real and simulated data respectively, in order to show the cloud implementation of techniques and algorithms described.File | Dimensione | Formato | |
---|---|---|---|
Piraccini_Alessio.pdf
accesso aperto
Dimensione
1.28 MB
Formato
Adobe PDF
|
1.28 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/38809