This thesis addresses the critical challenge of learning robust and data-efficient policies for aerial physical interaction tasks using Imitation Learning. An impedance controller is employed as the expert to track a trajectory that physically interacts with the environment in a compliant way and a neural network policy is trained to reproduce this behavior. To enable robust learning from a single expert demonstration, a novel data augmentation technique is proposed: Sampling Augmentation (SA) based on closed-loop state sensitivity. This method exploits sensitivity tubes to sample additional state-action pairs along the trajectory, drastically enriching the training dataset without requiring additional, time-consuming rollouts. The approach is benchmarked against Domain Randomization (DR) as a baseline. Experimental validation is performed on a fully actuated tilted hexarotor across a suite of tasks, including hovering under external disturbances and complex real-world push-and-slide interactions. Results show that while all learned policies are able to imitate the expert, only the policy trained with SA successfully achieves a stable policy using a single demonstration, drastically improving the time to collect necessary data. Furthermore, SA demonstrates superior robustness to both model uncertainties and external perturbations, consistently succeeding in challenging push-and-slide tasks that caused the DR baseline to fail.
Questa tesi ha l’obiettivo dell’apprendimento di polizze robuste e data-efficienti per compiti di interazione fisica aerea utilizzando l’Apprendimento per Imitazione. Viene impiegato un controllore di impedenza come esperto per tracciare una traiettoria che interagisca fisicamente con l’ambiente in sicurezza e una rete neurale viene addestrata per riprodurre tale comportamento. Per consentire un apprendimento robusto a partire da una singola dimostrazione dall’esperto, viene proposta una nuova tecnica di aumento dei dati: Sampling Augmentation (SA) basato sulla sensitività dello stato a circuito chiuso. Questo metodo sfrutta i tubi di sensitività per campionare sistematicamente coppie stato-azione aggiuntive lungo la traiettoria, arricchendo la base dati senza richiedere ulteriori e dispendiose dimostrazioni dell’esperto. L’approccio è confrontato con Domain Randomization (DR), trattata come riferimento per i risultati. La validazione sperimentale è eseguita su un esacottero completamente attuato attraverso una serie di compiti, inclusi la sospensione del drone in volo sotto disturbi esterni e obiettivi di spinta di oggetti nel mondo reale. I risultati dimostrano che, sebbene tutte le polizze apprese siano in grado di imitare l’esperto, solo la polizza addestrata con SA riesce a raggiungere con successo la stabilità utilizzando una singola dimostrazione, migliorando drasticamente il tempo necessario per raccogliere i dati. Inoltre, SA dimostra una robustezza superiore sia alle incertezze del modello che alle perturbazioni esterne, riuscendo costantemente nei compiti di spinta di oggetti più impegnativi, nei quali la polizza DR falliva.
One-Shot Imitation Learning for Aerial Physical Interaction Tasks via Sensitivity-Guided Data Augmentation
BORGHERINI, ALESSANDRO
2024/2025
Abstract
This thesis addresses the critical challenge of learning robust and data-efficient policies for aerial physical interaction tasks using Imitation Learning. An impedance controller is employed as the expert to track a trajectory that physically interacts with the environment in a compliant way and a neural network policy is trained to reproduce this behavior. To enable robust learning from a single expert demonstration, a novel data augmentation technique is proposed: Sampling Augmentation (SA) based on closed-loop state sensitivity. This method exploits sensitivity tubes to sample additional state-action pairs along the trajectory, drastically enriching the training dataset without requiring additional, time-consuming rollouts. The approach is benchmarked against Domain Randomization (DR) as a baseline. Experimental validation is performed on a fully actuated tilted hexarotor across a suite of tasks, including hovering under external disturbances and complex real-world push-and-slide interactions. Results show that while all learned policies are able to imitate the expert, only the policy trained with SA successfully achieves a stable policy using a single demonstration, drastically improving the time to collect necessary data. Furthermore, SA demonstrates superior robustness to both model uncertainties and external perturbations, consistently succeeding in challenging push-and-slide tasks that caused the DR baseline to fail.| File | Dimensione | Formato | |
|---|---|---|---|
|
Borgherini_Alessandro.pdf
accesso aperto
Dimensione
12.19 MB
Formato
Adobe PDF
|
12.19 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/98049