Dexterous robotic manipulation represents one of the most challenging areas of control in robotics, due to the high-dimensional, complex dynamics involved. Offline reinforcement learning provides a promising approach for training manipulation policies without requiring costly or risky online interaction with the environment. This work focuses on the application of offline reinforcement learning algorithms to robotic control tasks, specifically targeting dexterous manipulation tasks from the Adroit Hand suite, included in the D4RL datasets. Policies are trained exclusively from pre-collected datasets, without any additional online experience. Performance is assessed through online episodes in simulation. Several offline RL methods are evaluated, including Advantage-Weighted Actor-Critic (AWAC), Conservative Q-Learning (CQL), Implicit Q-Learning (IQL), and Twin Delayed Deep Deterministic Policy Gradient with Behavior Cloning (TD3+BC). A comparison with supervised learning approaches, such as Behavior Cloning (BC), is also provided to highlight the differences in performance and learning capabilities. Results show that offline RL algorithms, particularly IQL and TD3+BC, can learn robust policies from limited expert data. In parallel, a 3D-printed robotic hand has been developed as a prototype for future sim-to-real transfer.
Dexterous robotic manipulation represents one of the most challenging areas of control in robotics, due to the high-dimensional, complex dynamics involved. Offline reinforcement learning provides a promising approach for training manipulation policies without requiring costly or risky online interaction with the environment. This work focuses on the application of offline reinforcement learning algorithms to robotic control tasks, specifically targeting dexterous manipulation tasks from the Adroit Hand suite, included in the D4RL datasets. Policies are trained exclusively from pre-collected datasets, without any additional online experience. Performance is assessed through online episodes in simulation. Several offline RL methods are evaluated, including Advantage-Weighted Actor-Critic (AWAC), Conservative Q-Learning (CQL), Implicit Q-Learning (IQL), and Twin Delayed Deep Deterministic Policy Gradient with Behavior Cloning (TD3+BC). A comparison with supervised learning approaches, such as Behavior Cloning (BC), is also provided to highlight the differences in performance and learning capabilities. Results show that offline RL algorithms, particularly IQL and TD3+BC, can learn robust policies from limited expert data. In parallel, a 3D-printed robotic hand has been developed as a prototype for future sim-to-real transfer.
Offline Reinforcement Learning for Dexterous Robotic Hand Manipulation
CREMA, UMBERTO
2024/2025
Abstract
Dexterous robotic manipulation represents one of the most challenging areas of control in robotics, due to the high-dimensional, complex dynamics involved. Offline reinforcement learning provides a promising approach for training manipulation policies without requiring costly or risky online interaction with the environment. This work focuses on the application of offline reinforcement learning algorithms to robotic control tasks, specifically targeting dexterous manipulation tasks from the Adroit Hand suite, included in the D4RL datasets. Policies are trained exclusively from pre-collected datasets, without any additional online experience. Performance is assessed through online episodes in simulation. Several offline RL methods are evaluated, including Advantage-Weighted Actor-Critic (AWAC), Conservative Q-Learning (CQL), Implicit Q-Learning (IQL), and Twin Delayed Deep Deterministic Policy Gradient with Behavior Cloning (TD3+BC). A comparison with supervised learning approaches, such as Behavior Cloning (BC), is also provided to highlight the differences in performance and learning capabilities. Results show that offline RL algorithms, particularly IQL and TD3+BC, can learn robust policies from limited expert data. In parallel, a 3D-printed robotic hand has been developed as a prototype for future sim-to-real transfer.| File | Dimensione | Formato | |
|---|---|---|---|
|
Crema_Umberto.pdf
embargo fino al 24/07/2028
Dimensione
7.4 MB
Formato
Adobe PDF
|
7.4 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/89784