Students in an introductory C programming course frequently struggle with code understanding because achieving proficiency requires more than syntactic memorization; it demands a gradual understanding of abstract code concepts. There exist specific learning models like the Neo-Piaget framework, according to which novice programmers must base their learning on a progression of distinct cognitive stages to develop a true programming understanding. This framework guides student development across three stages of increasing complexity: sensorimotor, pre-operational, and concrete operational stages. Guiding students through these stages typically requires personalized one-to-one interaction, which is often unfeasible in large educational settings. To address this challenge, Large Language Models (LLMs) are an artificial intelligence (AI) system that could, in principle, simulate individual tutoring by using their capabilities in tasks like code generation, debugging, and natural language explanation. However, the learning by stages proposed by the Neo-Piaget theory is not covered by traditional AI tools, which are not oriented to identify specific cognitive barriers that prevent the correct abstraction of student learning. This thesis proposes an integrated architecture that leverages LLMs and knowledge tracing (KT) to provide a precise cognitive diagnosis, adaptive remediation, and academic risk prediction. The methodology involves the sequential use of three components. First, a Diagnostic Engine driven by a fine-tuned open-weight LLM that identifies missing programming skills and classifies student understanding within the Neo-Piagetian stages directly from exercise code submissions. Second, a Remediation Engine utilizes few-shot prompting techniques to generate tailored, appropriate exercises based on the student's actual cognitive level. Finally, a Predictive Engine uses these enriched diagnostics as inputs for a neural network to forecast the probability of student exam failure. These three components are then integrated into a functional web application designed to provide actionable feedback to both instructors and learners. The results demonstrate that the fine-tuned open-weights model achieved a diagnostic accuracy of 74% in cognitive-stage classification, outperforming general-purpose commercial LLMs. Furthermore, by forcing adherence to the learning framework through prompt engineering, the system successfully generated stage-appropriate exercises. Consequently, integrating these precise cognitive labels into the knowledge tracing system improved the prediction of student outcomes for the first partial exam from a baseline of 67% to 76%. Finally, the proposed system provides a solution for personalized C programming learning based on a pedagogical framework, with the goal of future real-world classroom deployments to validate its impact on final exam results.

Students in an introductory C programming course frequently struggle with code understanding because achieving proficiency requires more than syntactic memorization; it demands a gradual understanding of abstract code concepts. There exist specific learning models like the Neo-Piaget framework, according to which novice programmers must base their learning on a progression of distinct cognitive stages to develop a true programming understanding. This framework guides student development across three stages of increasing complexity: sensorimotor, pre-operational, and concrete operational stages. Guiding students through these stages typically requires personalized one-to-one interaction, which is often unfeasible in large educational settings. To address this challenge, Large Language Models (LLMs) are an artificial intelligence (AI) system that could, in principle, simulate individual tutoring by using their capabilities in tasks like code generation, debugging, and natural language explanation. However, the learning by stages proposed by the Neo-Piaget theory is not covered by traditional AI tools, which are not oriented to identify specific cognitive barriers that prevent the correct abstraction of student learning. This thesis proposes an integrated architecture that leverages LLMs and knowledge tracing (KT) to provide a precise cognitive diagnosis, adaptive remediation, and academic risk prediction. The methodology involves the sequential use of three components. First, a Diagnostic Engine driven by a fine-tuned open-weight LLM that identifies missing programming skills and classifies student understanding within the Neo-Piagetian stages directly from exercise code submissions. Second, a Remediation Engine utilizes few-shot prompting techniques to generate tailored, appropriate exercises based on the student's actual cognitive level. Finally, a Predictive Engine uses these enriched diagnostics as inputs for a neural network to forecast the probability of student exam failure. These three components are then integrated into a functional web application designed to provide actionable feedback to both instructors and learners. The results demonstrate that the fine-tuned open-weights model achieved a diagnostic accuracy of 74% in cognitive-stage classification, outperforming general-purpose commercial LLMs. Furthermore, by forcing adherence to the learning framework through prompt engineering, the system successfully generated stage-appropriate exercises. Consequently, integrating these precise cognitive labels into the knowledge tracing system improved the prediction of student outcomes for the first partial exam from a baseline of 67% to 76%. Finally, the proposed system provides a solution for personalized C programming learning based on a pedagogical framework, with the goal of future real-world classroom deployments to validate its impact on final exam results.

Integrating Fine-Tuned LLMs and LSTM-Based Knowledge Tracing for Adaptive Learning in C Programming: A Neo-Piagetian Approach

SANCHEZ PUMA, MICHELLE ELIZABETH
2025/2026

Abstract

Students in an introductory C programming course frequently struggle with code understanding because achieving proficiency requires more than syntactic memorization; it demands a gradual understanding of abstract code concepts. There exist specific learning models like the Neo-Piaget framework, according to which novice programmers must base their learning on a progression of distinct cognitive stages to develop a true programming understanding. This framework guides student development across three stages of increasing complexity: sensorimotor, pre-operational, and concrete operational stages. Guiding students through these stages typically requires personalized one-to-one interaction, which is often unfeasible in large educational settings. To address this challenge, Large Language Models (LLMs) are an artificial intelligence (AI) system that could, in principle, simulate individual tutoring by using their capabilities in tasks like code generation, debugging, and natural language explanation. However, the learning by stages proposed by the Neo-Piaget theory is not covered by traditional AI tools, which are not oriented to identify specific cognitive barriers that prevent the correct abstraction of student learning. This thesis proposes an integrated architecture that leverages LLMs and knowledge tracing (KT) to provide a precise cognitive diagnosis, adaptive remediation, and academic risk prediction. The methodology involves the sequential use of three components. First, a Diagnostic Engine driven by a fine-tuned open-weight LLM that identifies missing programming skills and classifies student understanding within the Neo-Piagetian stages directly from exercise code submissions. Second, a Remediation Engine utilizes few-shot prompting techniques to generate tailored, appropriate exercises based on the student's actual cognitive level. Finally, a Predictive Engine uses these enriched diagnostics as inputs for a neural network to forecast the probability of student exam failure. These three components are then integrated into a functional web application designed to provide actionable feedback to both instructors and learners. The results demonstrate that the fine-tuned open-weights model achieved a diagnostic accuracy of 74% in cognitive-stage classification, outperforming general-purpose commercial LLMs. Furthermore, by forcing adherence to the learning framework through prompt engineering, the system successfully generated stage-appropriate exercises. Consequently, integrating these precise cognitive labels into the knowledge tracing system improved the prediction of student outcomes for the first partial exam from a baseline of 67% to 76%. Finally, the proposed system provides a solution for personalized C programming learning based on a pedagogical framework, with the goal of future real-world classroom deployments to validate its impact on final exam results.
2025
Integrating Fine-Tuned LLMs and LSTM-Based Knowledge Tracing for Adaptive Learning in C Programming: A Neo-Piagetian Approach
Students in an introductory C programming course frequently struggle with code understanding because achieving proficiency requires more than syntactic memorization; it demands a gradual understanding of abstract code concepts. There exist specific learning models like the Neo-Piaget framework, according to which novice programmers must base their learning on a progression of distinct cognitive stages to develop a true programming understanding. This framework guides student development across three stages of increasing complexity: sensorimotor, pre-operational, and concrete operational stages. Guiding students through these stages typically requires personalized one-to-one interaction, which is often unfeasible in large educational settings. To address this challenge, Large Language Models (LLMs) are an artificial intelligence (AI) system that could, in principle, simulate individual tutoring by using their capabilities in tasks like code generation, debugging, and natural language explanation. However, the learning by stages proposed by the Neo-Piaget theory is not covered by traditional AI tools, which are not oriented to identify specific cognitive barriers that prevent the correct abstraction of student learning. This thesis proposes an integrated architecture that leverages LLMs and knowledge tracing (KT) to provide a precise cognitive diagnosis, adaptive remediation, and academic risk prediction. The methodology involves the sequential use of three components. First, a Diagnostic Engine driven by a fine-tuned open-weight LLM that identifies missing programming skills and classifies student understanding within the Neo-Piagetian stages directly from exercise code submissions. Second, a Remediation Engine utilizes few-shot prompting techniques to generate tailored, appropriate exercises based on the student's actual cognitive level. Finally, a Predictive Engine uses these enriched diagnostics as inputs for a neural network to forecast the probability of student exam failure. These three components are then integrated into a functional web application designed to provide actionable feedback to both instructors and learners. The results demonstrate that the fine-tuned open-weights model achieved a diagnostic accuracy of 74% in cognitive-stage classification, outperforming general-purpose commercial LLMs. Furthermore, by forcing adherence to the learning framework through prompt engineering, the system successfully generated stage-appropriate exercises. Consequently, integrating these precise cognitive labels into the knowledge tracing system improved the prediction of student outcomes for the first partial exam from a baseline of 67% to 76%. Finally, the proposed system provides a solution for personalized C programming learning based on a pedagogical framework, with the goal of future real-world classroom deployments to validate its impact on final exam results.
Fine-Tuned LLMs
Knowledge Tracing
Neo-Piaget Framework
File in questo prodotto:
File Dimensione Formato  
Sanchez_Michelle_Thesis.pdf

accesso aperto

Dimensione 9.16 MB
Formato Adobe PDF
9.16 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/108171