This thesis develops and evaluates a comprehensive set of statistical and machine learning models to explain and forecast the spread of infectious diseases, motivated by applications in public health and the insurance industry. Using diverse datasets for COVID-19 across ten countries and endemic diseases like Dengue in Colombia, this work integrates epidemiologi cal records with meteorological, environmental, behavioural (Google Trends), and policy data. The analysis spans three distinct modelling paradigms: long-term forecasting of cumulative cases with time series models (SIR, SEIR, ARIMA, GGM, GRU); short-term prediction of new cases using multivariate regression (Lasso, Random Forest, DNNs); and high-resolution spatial analysis of Dengue drivers via Generalized Additive Models (GAMs). Keyfindings reveal that for long-term forecasting, simpler data-driven models like ARIMA and diffusion models often provide more robust trend predictions than traditional mechanistic models, which struggle with the multi-wave nature of modern pandemics. In the multivari ate context, penalized linear models such as Lasso offer a compelling balance of predictive accuracy and interpretability, consistently identifying lagged cases and Google Trends as sig nificant predictors. The spatial GAM analysis successfully identified significant, non-linear relationships between Dengue risk and environmental factors, such as precipitation and high temperature. However, it also critically exposed poor predictive when compared to ensemble models. Ultimately, this thesis demonstrates that the optimal modelling strategy is contingent on the specific analytical goal. It provides a practical framework for selecting appropriate mod els—frominterpretable regression for policy insights to spatial models for descriptive risk map ping—thereby contributing a versatile toolkit for managing the multifaceted challenges posed by infectious diseases.

This thesis develops and evaluates a comprehensive set of statistical and machine learning models to explain and forecast the spread of infectious diseases, motivated by applications in public health and the insurance industry. Using diverse datasets for COVID-19 across ten countries and endemic diseases like Dengue in Colombia, this work integrates epidemiologi cal records with meteorological, environmental, behavioural (Google Trends), and policy data. The analysis spans three distinct modelling paradigms: long-term forecasting of cumulative cases with time series models (SIR, SEIR, ARIMA, GGM, GRU); short-term prediction of new cases using multivariate regression (Lasso, Random Forest, DNNs); and high-resolution spatial analysis of Dengue drivers via Generalized Additive Models (GAMs). Keyfindings reveal that for long-term forecasting, simpler data-driven models like ARIMA and diffusion models often provide more robust trend predictions than traditional mechanistic models, which struggle with the multi-wave nature of modern pandemics. In the multivari ate context, penalized linear models such as Lasso offer a compelling balance of predictive accuracy and interpretability, consistently identifying lagged cases and Google Trends as sig nificant predictors. The spatial GAM analysis successfully identified significant, non-linear relationships between Dengue risk and environmental factors, such as precipitation and high temperature. However, it also critically exposed poor predictive when compared to ensemble models. Ultimately, this thesis demonstrates that the optimal modelling strategy is contingent on the specific analytical goal. It provides a practical framework for selecting appropriate mod els—frominterpretable regression for policy insights to spatial models for descriptive risk map ping—thereby contributing a versatile toolkit for managing the multifaceted challenges posed by infectious diseases.

MODELLING INFECTIOUS DISEASES

GUTIERREZ VELEZ, DANIEL
2024/2025

Abstract

This thesis develops and evaluates a comprehensive set of statistical and machine learning models to explain and forecast the spread of infectious diseases, motivated by applications in public health and the insurance industry. Using diverse datasets for COVID-19 across ten countries and endemic diseases like Dengue in Colombia, this work integrates epidemiologi cal records with meteorological, environmental, behavioural (Google Trends), and policy data. The analysis spans three distinct modelling paradigms: long-term forecasting of cumulative cases with time series models (SIR, SEIR, ARIMA, GGM, GRU); short-term prediction of new cases using multivariate regression (Lasso, Random Forest, DNNs); and high-resolution spatial analysis of Dengue drivers via Generalized Additive Models (GAMs). Keyfindings reveal that for long-term forecasting, simpler data-driven models like ARIMA and diffusion models often provide more robust trend predictions than traditional mechanistic models, which struggle with the multi-wave nature of modern pandemics. In the multivari ate context, penalized linear models such as Lasso offer a compelling balance of predictive accuracy and interpretability, consistently identifying lagged cases and Google Trends as sig nificant predictors. The spatial GAM analysis successfully identified significant, non-linear relationships between Dengue risk and environmental factors, such as precipitation and high temperature. However, it also critically exposed poor predictive when compared to ensemble models. Ultimately, this thesis demonstrates that the optimal modelling strategy is contingent on the specific analytical goal. It provides a practical framework for selecting appropriate mod els—frominterpretable regression for policy insights to spatial models for descriptive risk map ping—thereby contributing a versatile toolkit for managing the multifaceted challenges posed by infectious diseases.
2024
MODELLING INFECTIOUS DISEASES
This thesis develops and evaluates a comprehensive set of statistical and machine learning models to explain and forecast the spread of infectious diseases, motivated by applications in public health and the insurance industry. Using diverse datasets for COVID-19 across ten countries and endemic diseases like Dengue in Colombia, this work integrates epidemiologi cal records with meteorological, environmental, behavioural (Google Trends), and policy data. The analysis spans three distinct modelling paradigms: long-term forecasting of cumulative cases with time series models (SIR, SEIR, ARIMA, GGM, GRU); short-term prediction of new cases using multivariate regression (Lasso, Random Forest, DNNs); and high-resolution spatial analysis of Dengue drivers via Generalized Additive Models (GAMs). Keyfindings reveal that for long-term forecasting, simpler data-driven models like ARIMA and diffusion models often provide more robust trend predictions than traditional mechanistic models, which struggle with the multi-wave nature of modern pandemics. In the multivari ate context, penalized linear models such as Lasso offer a compelling balance of predictive accuracy and interpretability, consistently identifying lagged cases and Google Trends as sig nificant predictors. The spatial GAM analysis successfully identified significant, non-linear relationships between Dengue risk and environmental factors, such as precipitation and high temperature. However, it also critically exposed poor predictive when compared to ensemble models. Ultimately, this thesis demonstrates that the optimal modelling strategy is contingent on the specific analytical goal. It provides a practical framework for selecting appropriate mod els—frominterpretable regression for policy insights to spatial models for descriptive risk map ping—thereby contributing a versatile toolkit for managing the multifaceted challenges posed by infectious diseases.
TIME SERIES
GAM
SIR
GGM
File in questo prodotto:
File Dimensione Formato  
GutierrezVelez_Daniel.pdf

accesso aperto

Dimensione 1.97 MB
Formato Adobe PDF
1.97 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/91830