This thesis investigates the application of Deep Reinforcement Learning (Deep-RL) algorithms for autonomous navigation tasks in resource-constrained environments. Specifically, we focus on three state-of-the-art continuous control algorithms: Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic Policy Gradient (TD3), and Soft Actor-Critic (SAC), assessing their performance, computational requirements, and sim-to-real transferability. The experimental framework progresses from simulation to real-world deployment on a custom TurtleBot3 platform, addressing the challenges of deploying Deep-RL solutions on edge computing devices. Results demonstrate that simulation pre-training followed by real-world fine-tuning provides significant advantages in learning efficiency compared to training from scratch, and that algorithms with higher control frequencies (DDPG and TD3) can outperform slower ones (SAC) in resource-constrained settings. This is in contrast with the actual setup/choice commonly seen in the field, as SAC is the most considered algorithm due to its high performance in non-computationally constrained settings. This work involves more than 120 hours of real-world experiments and shows evidence of a possible gap between on-paper performance and real-world performance of Deep-RL algorithms due to their different computational requirements and assumptions.
Questa tesi indaga l’applicazione di algoritmi di Deep Reinforcement Learning (Deep RL) per compiti di navigazione autonoma in ambienti con risorse limitate. In particolare, valuta tre algoritmi di controllo continuo allo stato dell'arte: Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3) e Soft Actor-Critic (SAC), in termini di prestazioni, requisiti computazionali e trasferibilità dal simulatore al mondo reale. Il framework sperimentale si estende dalla simulazione fino al dispiegamento su un robot TurtleBot3 personalizzato, affrontando le sfide dell’esecuzione di soluzioni Deep RL su dispositivi di edge computing. I risultati mostrano che un pre‐allenamento in simulazione seguito da un affinamento nel mondo reale migliora significativamente l’efficienza di apprendimento rispetto all’addestramento “da zero”. Inoltre, gli algoritmi che operano a frequenze di controllo più elevate (DDPG e TD3) possono superare metodi più lenti (SAC) in condizioni di risorse limitate. Questo risultato contrasta con l’uso comune nel settore, dove SAC è spesso preferito per le sue alte prestazioni in contesti non vincolati da limiti computazionali. Dopo oltre 120 ore di esperimenti nel mondo reale, questo lavoro fornisce evidenze di un divario tra le prestazioni teoriche e quelle pratiche degli algoritmi di Deep RL, dovuto dalle diverse esigenze computazionali e dalle assunzioni.
Evaluating Deep Reinforcement Learning Algorithms for Autonomous Navigation on Edge Devices
DAL NEVO, MATTEO
2024/2025
Abstract
This thesis investigates the application of Deep Reinforcement Learning (Deep-RL) algorithms for autonomous navigation tasks in resource-constrained environments. Specifically, we focus on three state-of-the-art continuous control algorithms: Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic Policy Gradient (TD3), and Soft Actor-Critic (SAC), assessing their performance, computational requirements, and sim-to-real transferability. The experimental framework progresses from simulation to real-world deployment on a custom TurtleBot3 platform, addressing the challenges of deploying Deep-RL solutions on edge computing devices. Results demonstrate that simulation pre-training followed by real-world fine-tuning provides significant advantages in learning efficiency compared to training from scratch, and that algorithms with higher control frequencies (DDPG and TD3) can outperform slower ones (SAC) in resource-constrained settings. This is in contrast with the actual setup/choice commonly seen in the field, as SAC is the most considered algorithm due to its high performance in non-computationally constrained settings. This work involves more than 120 hours of real-world experiments and shows evidence of a possible gap between on-paper performance and real-world performance of Deep-RL algorithms due to their different computational requirements and assumptions.| File | Dimensione | Formato | |
|---|---|---|---|
|
DalNevo_Matteo.pdf
accesso aperto
Dimensione
8.26 MB
Formato
Adobe PDF
|
8.26 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/86930