In the evolving landscape of digital education, Educational Learning Analytics (ELA) offers powerful tools to extract actionable insights from student interaction data. This study applies ELA to a multi-year dataset from a university-level programming course hosted on Moodle, aiming to develop robust predictive models of student performance. Going beyond conventional approaches, the research examines how engagement with specific activity types, such as voluntary programming lab (VPL) assignments, quizzes, and general Moodle interactions, can serve as early indicators of academic success or risk. A key contribution of this work lies in assessing the temporal and contextual variability of predictive models across three academic cohorts (2021–2024). The results reveal that model performance is sensitive to cohort-specific behaviors and instructional changes, but combining data from multiple years improves generalizability while reducing overfitting. The study also explores how redefining the prediction target, such as shifting focus from final course outcomes to earlier milestones like midterm or first exam session performance, yields more balanced and accurate predictions. Feature importance analyses consistently rank VPL-related metrics as the most predictive, highlighting the importance of practical, hands-on activities in learning programming. Conversely, quiz features contribute less predictively, likely due to their limited coverage and higher variability. Methodologically, the integration of ensemble learning (Soft Voting with Random Forest and KNN) and oversampling techniques proved highly effective in addressing class imbalance, significantly improving F1-scores, especially for underrepresented outcome groups. Despite challenges related to inconsistent LMS data structures and variability in the course designs, the study demonstrates the feasibility of building scalable, context-aware predictive systems. Its findings offer practical implications for the design of adaptive learning environments, supporting more personalized and data-informed education.

In the evolving landscape of digital education, Educational Learning Analytics (ELA) offers powerful tools to extract actionable insights from student interaction data. This study applies ELA to a multi-year dataset from a university-level programming course hosted on Moodle, aiming to develop robust predictive models of student performance. Going beyond conventional approaches, the research examines how engagement with specific activity types, such as voluntary programming lab (VPL) assignments, quizzes, and general Moodle interactions, can serve as early indicators of academic success or risk. A key contribution of this work lies in assessing the temporal and contextual variability of predictive models across three academic cohorts (2021–2024). The results reveal that model performance is sensitive to cohort-specific behaviors and instructional changes, but combining data from multiple years improves generalizability while reducing overfitting. The study also explores how redefining the prediction target, such as shifting focus from final course outcomes to earlier milestones like midterm or first exam session performance, yields more balanced and accurate predictions. Feature importance analyses consistently rank VPL-related metrics as the most predictive, highlighting the importance of practical, hands-on activities in learning programming. Conversely, quiz features contribute less predictively, likely due to their limited coverage and higher variability. Methodologically, the integration of ensemble learning (Soft Voting with Random Forest and KNN) and oversampling techniques proved highly effective in addressing class imbalance, significantly improving F1-scores, especially for underrepresented outcome groups. Despite challenges related to inconsistent LMS data structures and variability in the course designs, the study demonstrates the feasibility of building scalable, context-aware predictive systems. Its findings offer practical implications for the design of adaptive learning environments, supporting more personalized and data-informed education.

Educational Learning Analytics: Data-Driven Approaches to Student Performance Prediction

SAGADIYEVA, AIGERIM
2024/2025

Abstract

In the evolving landscape of digital education, Educational Learning Analytics (ELA) offers powerful tools to extract actionable insights from student interaction data. This study applies ELA to a multi-year dataset from a university-level programming course hosted on Moodle, aiming to develop robust predictive models of student performance. Going beyond conventional approaches, the research examines how engagement with specific activity types, such as voluntary programming lab (VPL) assignments, quizzes, and general Moodle interactions, can serve as early indicators of academic success or risk. A key contribution of this work lies in assessing the temporal and contextual variability of predictive models across three academic cohorts (2021–2024). The results reveal that model performance is sensitive to cohort-specific behaviors and instructional changes, but combining data from multiple years improves generalizability while reducing overfitting. The study also explores how redefining the prediction target, such as shifting focus from final course outcomes to earlier milestones like midterm or first exam session performance, yields more balanced and accurate predictions. Feature importance analyses consistently rank VPL-related metrics as the most predictive, highlighting the importance of practical, hands-on activities in learning programming. Conversely, quiz features contribute less predictively, likely due to their limited coverage and higher variability. Methodologically, the integration of ensemble learning (Soft Voting with Random Forest and KNN) and oversampling techniques proved highly effective in addressing class imbalance, significantly improving F1-scores, especially for underrepresented outcome groups. Despite challenges related to inconsistent LMS data structures and variability in the course designs, the study demonstrates the feasibility of building scalable, context-aware predictive systems. Its findings offer practical implications for the design of adaptive learning environments, supporting more personalized and data-informed education.
2024
Educational Learning Analytics: Data-Driven Approaches to Student Performance Prediction
In the evolving landscape of digital education, Educational Learning Analytics (ELA) offers powerful tools to extract actionable insights from student interaction data. This study applies ELA to a multi-year dataset from a university-level programming course hosted on Moodle, aiming to develop robust predictive models of student performance. Going beyond conventional approaches, the research examines how engagement with specific activity types, such as voluntary programming lab (VPL) assignments, quizzes, and general Moodle interactions, can serve as early indicators of academic success or risk. A key contribution of this work lies in assessing the temporal and contextual variability of predictive models across three academic cohorts (2021–2024). The results reveal that model performance is sensitive to cohort-specific behaviors and instructional changes, but combining data from multiple years improves generalizability while reducing overfitting. The study also explores how redefining the prediction target, such as shifting focus from final course outcomes to earlier milestones like midterm or first exam session performance, yields more balanced and accurate predictions. Feature importance analyses consistently rank VPL-related metrics as the most predictive, highlighting the importance of practical, hands-on activities in learning programming. Conversely, quiz features contribute less predictively, likely due to their limited coverage and higher variability. Methodologically, the integration of ensemble learning (Soft Voting with Random Forest and KNN) and oversampling techniques proved highly effective in addressing class imbalance, significantly improving F1-scores, especially for underrepresented outcome groups. Despite challenges related to inconsistent LMS data structures and variability in the course designs, the study demonstrates the feasibility of building scalable, context-aware predictive systems. Its findings offer practical implications for the design of adaptive learning environments, supporting more personalized and data-informed education.
Learning Analytics
Machine Learning
Predictive Modeling
File in questo prodotto:
File Dimensione Formato  
AigerimSagadiyeva_Thesis.pdf

Accesso riservato

Dimensione 4.42 MB
Formato Adobe PDF
4.42 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/89972