Reinforcement Learning (RL) has emerged as a fundamental paradigm for training agents to solve decision-making and control tasks. However, when applied to complex or large-scale environments, RL often faces challenges related to sample efficiency and long training times, primarily due to the extensive exploration required. At the same time, recent advances in multimodal Large Language Models (LLMs), particularly Vision-Language Models (VLMs), have demonstrated remarkable capabilities in tasks requiring visual understanding. This study explores the integration of Vision-Language Models (VLMs) into Reinforcement Learning (RL) through knowledge distillation. Specifically, we investigate an approach that leverages VLMs as teacher models to guide the training of student agents optimized with Proximal Policy Optimization (PPO). The objective is to improve sample efficiency and accelerate learning by transferring decision-making capabilities from the VLM teacher to the RL agent. Experiments are conducted in the MiniGrid benchmark environments to evaluate the impact of VLM-based knowledge distillation compared to standard PPO training and to knowledge transfer from purely text-based LLMs. Additional experiments are conducted in Super Mario Bros (1985) to assess the effectiveness of the approach in more complex applications. Results indicate that VLM teachers provide slight improvements in training efficiency within this setting, suggesting that open VLMs can serve as effective teachers for RL agents. These results open opportunities for further research into scaling the approach and exploring its potential across more diverse environments and applications.
Il Reinforcement Learning (RL) si è affermato come un paradigma fondamentale per l’addestramento di agenti in grado di risolvere task di decision-making e controllo. Tuttavia, quando applicato ad ambienti complessi o di grande scala, l’RL si scontra spesso con sfide legate all’efficienza nell’uso dei campioni e ai lunghi tempi di addestramento, principalmente a causa dell’ampia esplorazione richiesta. Parallelamente, i recenti progressi nei Large Language Models (LLM) multimodali, e in particolare nei Vision-Language Models (VLM), hanno dimostrato capacità notevoli in compiti che richiedono comprensione visiva. Questo studio esplora l’integrazione dei Vision-Language Models (VLM) in RL tramite knowledge distillation. In particolare, in questo studio viene analizzato un approccio che utilizza i VLM come 'teacher' per guidare l’addestramento di agenti 'student' ottimizzati con Proximal Policy Optimization (PPO). L’obiettivo è migliorare l’efficienza nell’uso dei campioni e accelerare l’apprendimento trasferendo le capacità di decision-making dal VLM insegnante all’agente RL. Gli esperimenti sono stati condotti negli ambienti benchmark MiniGrid per valutare l’impatto della knowledge distillation basata su VLM, confrontandola sia con l’addestramento standard PPO sia con il knowledge transfer da LLM puramente testuali. Esperimenti aggiuntivi sono stati condotti in Super Mario Bros (1985) per valutare l'efficacia dell'approccio in applicazioni più complesse. I risultati indicano che i teacher VLM forniscono lievi miglioramenti in termini di efficienza dell’addestramento in questo contesto, suggerendo che i VLM open possano essere teacher efficaci per agenti RL. Questi risultati aprono la strada a ulteriori ricerche sullo scaling dell’approccio e sulla sua applicazione in ambienti e scenari più diversificati.
Sfruttare Modelli Open di Visione-Linguaggio come Teachers nel Reinforcement Learning
GENILOTTI, FABRIZIO
2024/2025
Abstract
Reinforcement Learning (RL) has emerged as a fundamental paradigm for training agents to solve decision-making and control tasks. However, when applied to complex or large-scale environments, RL often faces challenges related to sample efficiency and long training times, primarily due to the extensive exploration required. At the same time, recent advances in multimodal Large Language Models (LLMs), particularly Vision-Language Models (VLMs), have demonstrated remarkable capabilities in tasks requiring visual understanding. This study explores the integration of Vision-Language Models (VLMs) into Reinforcement Learning (RL) through knowledge distillation. Specifically, we investigate an approach that leverages VLMs as teacher models to guide the training of student agents optimized with Proximal Policy Optimization (PPO). The objective is to improve sample efficiency and accelerate learning by transferring decision-making capabilities from the VLM teacher to the RL agent. Experiments are conducted in the MiniGrid benchmark environments to evaluate the impact of VLM-based knowledge distillation compared to standard PPO training and to knowledge transfer from purely text-based LLMs. Additional experiments are conducted in Super Mario Bros (1985) to assess the effectiveness of the approach in more complex applications. Results indicate that VLM teachers provide slight improvements in training efficiency within this setting, suggesting that open VLMs can serve as effective teachers for RL agents. These results open opportunities for further research into scaling the approach and exploring its potential across more diverse environments and applications.| File | Dimensione | Formato | |
|---|---|---|---|
|
Genilotti_Fabrizio.pdf
accesso aperto
Dimensione
3.07 MB
Formato
Adobe PDF
|
3.07 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/99044