Offline Reinforcement Learning for Dexterous Robotic Hand Manipulation

Dexterous robotic manipulation represents one of the most challenging areas of control in robotics, due to the high-dimensional, complex dynamics involved. Offline reinforcement learning provides a promising approach for training manipulation policies without requiring costly or risky online interaction with the environment. This work focuses on the application of offline reinforcement learning algorithms to robotic control tasks, specifically targeting dexterous manipulation tasks from the Adroit Hand suite, included in the D4RL datasets. Policies are trained exclusively from pre-collected datasets, without any additional online experience. Performance is assessed through online episodes in simulation. Several offline RL methods are evaluated, including Advantage-Weighted Actor-Critic (AWAC), Conservative Q-Learning (CQL), Implicit Q-Learning (IQL), and Twin Delayed Deep Deterministic Policy Gradient with Behavior Cloning (TD3+BC). A comparison with supervised learning approaches, such as Behavior Cloning (BC), is also provided to highlight the differences in performance and learning capabilities. Results show that offline RL algorithms, particularly IQL and TD3+BC, can learn robust policies from limited expert data. In parallel, a 3D-printed robotic hand has been developed as a prototype for future sim-to-real transfer.

Offline Reinforcement Learning for Dexterous Robotic Hand Manipulation

CREMA, UMBERTO

2024/2025

Abstract

Dexterous robotic manipulation represents one of the most challenging areas of control in robotics, due to the high-dimensional, complex dynamics involved. Offline reinforcement learning provides a promising approach for training manipulation policies without requiring costly or risky online interaction with the environment. This work focuses on the application of offline reinforcement learning algorithms to robotic control tasks, specifically targeting dexterous manipulation tasks from the Adroit Hand suite, included in the D4RL datasets. Policies are trained exclusively from pre-collected datasets, without any additional online experience. Performance is assessed through online episodes in simulation. Several offline RL methods are evaluated, including Advantage-Weighted Actor-Critic (AWAC), Conservative Q-Learning (CQL), Implicit Q-Learning (IQL), and Twin Delayed Deep Deterministic Policy Gradient with Behavior Cloning (TD3+BC). A comparison with supervised learning approaches, such as Behavior Cloning (BC), is also provided to highlight the differences in performance and learning capabilities. Results show that offline RL algorithms, particularly IQL and TD3+BC, can learn robust policies from limited expert data. In parallel, a 3D-printed robotic hand has been developed as a prototype for future sim-to-real transfer.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				INGEGNERIA INFORMATICA Laurea di Primo Livello (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Offline Reinforcement Learning for Dexterous Robotic Hand Manipulation
			
	Abstract in italiano
	
				Dexterous robotic manipulation represents one of the most challenging areas of control in robotics, due to the high-dimensional, complex dynamics involved. Offline reinforcement learning provides a promising approach for training manipulation policies without requiring costly or risky online interaction with the environment.

This work focuses on the application of offline reinforcement learning algorithms to robotic control tasks, specifically targeting dexterous manipulation tasks from the Adroit Hand suite, included in the D4RL datasets. Policies are trained exclusively from pre-collected datasets, without any additional online experience. Performance is assessed through online episodes in simulation.

Several offline RL methods are evaluated, including Advantage-Weighted Actor-Critic (AWAC), Conservative Q-Learning (CQL), Implicit Q-Learning (IQL), and Twin Delayed Deep Deterministic Policy Gradient with Behavior Cloning (TD3+BC). A comparison with supervised learning approaches, such as Behavior Cloning (BC), is also provided to highlight the differences in performance and learning capabilities.

Results show that offline RL algorithms, particularly IQL and TD3+BC, can learn robust policies from limited expert data.

In parallel, a 3D-printed robotic hand has been developed as a prototype for future sim-to-real transfer.
			
	Parola chiave
	
				Offline RL
Robotic Manipulation
Dexterous Hands
RL Algorithms
Simulation
			
	Relatore
	
				BERALDO, GLORIA
			
	Appare nelle tipologie:
	
				Lauree triennali

File in questo prodotto:

File	Dimensione	Formato
Crema_Umberto.pdf embargo fino al 24/07/2028 Dimensione 7.4 MB Formato Adobe PDF	7.4 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/89784