This thesis explores the alignment of Large Language Models (LLMs) for educational purposes through the creation of a pedagogically structured preference dataset and using Direct Preference Optimization (DPO). The work introduces a novel conversational tree methodology to transform static educational content into dynamic multi-turn dialogues, embedding core pedagogical rules such as constructivist learning and adaptive feedback. This approach enables the scalable generation of training data that balances instructional quality with conversational coherence while addressing challenges like memory constraints through efficient training techniques. Key contributions include a systematic framework for converting raw educational materials into preference-optimized dialogues, strategies to mitigate model instability, and insights into balancing pedagogical alignment with conversational flexibility. Despite residual challenges such as occasional unintended behaviours, the results demonstrate the feasibility of training LLMs to sustain pedagogically sound interactions. The study advances the development of AI-driven educational tools, offering a pathway to bridge formal pedagogy with conversational AI while prioritizing resource efficiency and scalability.

This thesis explores the alignment of Large Language Models (LLMs) for educational purposes through the creation of a pedagogically structured preference dataset and using Direct Preference Optimization (DPO). The work introduces a novel conversational tree methodology to transform static educational content into dynamic multi-turn dialogues, embedding core pedagogical rules such as constructivist learning and adaptive feedback. This approach enables the scalable generation of training data that balances instructional quality with conversational coherence while addressing challenges like memory constraints through efficient training techniques. Key contributions include a systematic framework for converting raw educational materials into preference-optimized dialogues, strategies to mitigate model instability, and insights into balancing pedagogical alignment with conversational flexibility. Despite residual challenges such as occasional unintended behaviours, the results demonstrate the feasibility of training LLMs to sustain pedagogically sound interactions. The study advances the development of AI-driven educational tools, offering a pathway to bridge formal pedagogy with conversational AI while prioritizing resource efficiency and scalability.

Exploring novel applications of Direct Preference Optimization for Pedagogical ChatBots

GIROTTO, PIETRO
2024/2025

Abstract

This thesis explores the alignment of Large Language Models (LLMs) for educational purposes through the creation of a pedagogically structured preference dataset and using Direct Preference Optimization (DPO). The work introduces a novel conversational tree methodology to transform static educational content into dynamic multi-turn dialogues, embedding core pedagogical rules such as constructivist learning and adaptive feedback. This approach enables the scalable generation of training data that balances instructional quality with conversational coherence while addressing challenges like memory constraints through efficient training techniques. Key contributions include a systematic framework for converting raw educational materials into preference-optimized dialogues, strategies to mitigate model instability, and insights into balancing pedagogical alignment with conversational flexibility. Despite residual challenges such as occasional unintended behaviours, the results demonstrate the feasibility of training LLMs to sustain pedagogically sound interactions. The study advances the development of AI-driven educational tools, offering a pathway to bridge formal pedagogy with conversational AI while prioritizing resource efficiency and scalability.
2024
Exploring novel applications of Direct Preference Optimization for Pedagogical ChatBots
This thesis explores the alignment of Large Language Models (LLMs) for educational purposes through the creation of a pedagogically structured preference dataset and using Direct Preference Optimization (DPO). The work introduces a novel conversational tree methodology to transform static educational content into dynamic multi-turn dialogues, embedding core pedagogical rules such as constructivist learning and adaptive feedback. This approach enables the scalable generation of training data that balances instructional quality with conversational coherence while addressing challenges like memory constraints through efficient training techniques. Key contributions include a systematic framework for converting raw educational materials into preference-optimized dialogues, strategies to mitigate model instability, and insights into balancing pedagogical alignment with conversational flexibility. Despite residual challenges such as occasional unintended behaviours, the results demonstrate the feasibility of training LLMs to sustain pedagogically sound interactions. The study advances the development of AI-driven educational tools, offering a pathway to bridge formal pedagogy with conversational AI while prioritizing resource efficiency and scalability.
ChatBot
LLM
DPO
RLHF
NLP
File in questo prodotto:
File Dimensione Formato  
Girotto_Pietro.pdf

accesso aperto

Dimensione 12.65 MB
Formato Adobe PDF
12.65 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/83213