This thesis aims to explore the determinants of academic success by analysing a large-scale dataset extracted from the internal systems of Vita-Salute San Raffaele University. The dataset includes key variables that provide a rich and context-specific foundation for the analysis. The research seeks to generate actionable insights to enhance academic management and guide timely interventions for at-risk students by applying predictive models and advanced analytics techniques. The central objective is to develop a comprehensive analytical framework that leverages both parametric and non-parametric approaches to assess and predict student success. The analysis focuses on core factors and their respective impacts on academic performance, time to graduation, and dropout probability. This study addresses critical challenges faced by universities today, including improving graduation rates, offering personalised academic support, and evaluating the educational impact of support services. This study introduces a multi-method predictive modelling pipeline that integrates penalised logistic and Cox regressions, ensemble learning algorithms (Random Forest and XGBoost), generalised additive models, and survival analysis, all augmented with model-agnostic interpretability techniques such as SHAP values, which provide insightful indications about the contribution of features, supporting a more transparent understanding of key drivers. Our findings reveal a wealth of information about the relative importance of academic performance and demographic variables, highlighting that weighted CFU (University Educational Credits) and age at enrollment are the strongest predictors of on-time graduation, that static profile data alone provide limited power to forecast early dropout, and that entrance exam scores exhibit a complex, nonlinear relationship with final CFU, thus offering clear, evidence-based guidance for targeted student support and resource allocation.

This thesis aims to explore the determinants of academic success by analysing a large-scale dataset extracted from the internal systems of Vita-Salute San Raffaele University. The dataset includes key variables that provide a rich and context-specific foundation for the analysis. The research seeks to generate actionable insights to enhance academic management and guide timely interventions for at-risk students by applying predictive models and advanced analytics techniques. The central objective is to develop a comprehensive analytical framework that leverages both parametric and non-parametric approaches to assess and predict student success. The analysis focuses on core factors and their respective impacts on academic performance, time to graduation, and dropout probability. This study addresses critical challenges faced by universities today, including improving graduation rates, offering personalised academic support, and evaluating the educational impact of support services. This study introduces a multi-method predictive modelling pipeline that integrates penalised logistic and Cox regressions, ensemble learning algorithms (Random Forest and XGBoost), generalised additive models, and survival analysis, all augmented with model-agnostic interpretability techniques such as SHAP values, which provide insightful indications about the contribution of features, supporting a more transparent understanding of key drivers. Our findings reveal a wealth of information about the relative importance of academic performance and demographic variables, highlighting that weighted CFU (University Educational Credits) and age at enrollment are the strongest predictors of on-time graduation, that static profile data alone provide limited power to forecast early dropout, and that entrance exam scores exhibit a complex, nonlinear relationship with final CFU, thus offering clear, evidence-based guidance for targeted student support and resource allocation.

Predictive Modelling of Student Outcomes: Risk Factors, Early Dropout, and Time-to-Degree

ZAMPELLI, SARA
2024/2025

Abstract

This thesis aims to explore the determinants of academic success by analysing a large-scale dataset extracted from the internal systems of Vita-Salute San Raffaele University. The dataset includes key variables that provide a rich and context-specific foundation for the analysis. The research seeks to generate actionable insights to enhance academic management and guide timely interventions for at-risk students by applying predictive models and advanced analytics techniques. The central objective is to develop a comprehensive analytical framework that leverages both parametric and non-parametric approaches to assess and predict student success. The analysis focuses on core factors and their respective impacts on academic performance, time to graduation, and dropout probability. This study addresses critical challenges faced by universities today, including improving graduation rates, offering personalised academic support, and evaluating the educational impact of support services. This study introduces a multi-method predictive modelling pipeline that integrates penalised logistic and Cox regressions, ensemble learning algorithms (Random Forest and XGBoost), generalised additive models, and survival analysis, all augmented with model-agnostic interpretability techniques such as SHAP values, which provide insightful indications about the contribution of features, supporting a more transparent understanding of key drivers. Our findings reveal a wealth of information about the relative importance of academic performance and demographic variables, highlighting that weighted CFU (University Educational Credits) and age at enrollment are the strongest predictors of on-time graduation, that static profile data alone provide limited power to forecast early dropout, and that entrance exam scores exhibit a complex, nonlinear relationship with final CFU, thus offering clear, evidence-based guidance for targeted student support and resource allocation.
2024
Predictive Modelling of Student Outcomes: Risk Factors, Early Dropout, and Time-to-Degree
This thesis aims to explore the determinants of academic success by analysing a large-scale dataset extracted from the internal systems of Vita-Salute San Raffaele University. The dataset includes key variables that provide a rich and context-specific foundation for the analysis. The research seeks to generate actionable insights to enhance academic management and guide timely interventions for at-risk students by applying predictive models and advanced analytics techniques. The central objective is to develop a comprehensive analytical framework that leverages both parametric and non-parametric approaches to assess and predict student success. The analysis focuses on core factors and their respective impacts on academic performance, time to graduation, and dropout probability. This study addresses critical challenges faced by universities today, including improving graduation rates, offering personalised academic support, and evaluating the educational impact of support services. This study introduces a multi-method predictive modelling pipeline that integrates penalised logistic and Cox regressions, ensemble learning algorithms (Random Forest and XGBoost), generalised additive models, and survival analysis, all augmented with model-agnostic interpretability techniques such as SHAP values, which provide insightful indications about the contribution of features, supporting a more transparent understanding of key drivers. Our findings reveal a wealth of information about the relative importance of academic performance and demographic variables, highlighting that weighted CFU (University Educational Credits) and age at enrollment are the strongest predictors of on-time graduation, that static profile data alone provide limited power to forecast early dropout, and that entrance exam scores exhibit a complex, nonlinear relationship with final CFU, thus offering clear, evidence-based guidance for targeted student support and resource allocation.
academic performance
dropout prediction
student data
File in questo prodotto:
File Dimensione Formato  
Zampelli Sara.pdf

Accesso riservato

Dimensione 1.54 MB
Formato Adobe PDF
1.54 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/91846