A foundation model for robotic manipulation: GAM, Grasp-Anything-Model

This thesis introduces the Grasp-Anything-Model (GAM), a modular, interpretable, and ROS2-native grasping pipeline designed to enable zero-shot manipulation of arbitrary objects in unstructured environments. Built for the Tiago family of service robots developed by PAL Robotics, GAM integrates prompt-grounded perception, single-view, 3Dreconstruction, grasp pose generation, and task-aware motion planning into a unified framework. Unlike monolithic or dataset-bound systems, GAM embraces decoupled service-oriented modules, including Grounded-Segment-Anything (GSAM), Multiview Compressive Coding (MCC), and MoveIt Task Constructor (MTC)— to ensure component reusability, architectural transparency, and diagnostic granularity. The system is evaluated across both simulated and real-world settings using a representative set of objects varying in geometry, size, and occlusion. Empirical results demonstrate GAM’s ability to generalize grasp strategies without object specific retraining, achieve stable motion execution in partially observed scenes, and operate under CPU-only constraints with total runtime under one minute per grasp. Planning latency is significantly reduced via MTC’s structured action pipeline, while grasp success rates exceeded 70% on average across diverse object categories. Nonetheless, limitations in sensor quality, segmentation under ambiguity, and grasp planning near constrained surfaces highlight opportunities for further optimization. By prioritizing modularity over end-to-end speed, this work contributes a foundational manipulation framework suitable for deployment, extension, and research in real-world robotic systems. GAM is not presented as a final grasping solution, but as a robust platform upon which future advances in data-driven manipulation, adaptive behaviour logic, and system level autonomy can be built.

A foundation model for robotic manipulation: GAM, Grasp-Anything-Model

VILLANI, MATTEO

2024/2025

Abstract

This thesis introduces the Grasp-Anything-Model (GAM), a modular, interpretable, and ROS2-native grasping pipeline designed to enable zero-shot manipulation of arbitrary objects in unstructured environments. Built for the Tiago family of service robots developed by PAL Robotics, GAM integrates prompt-grounded perception, single-view, 3Dreconstruction, grasp pose generation, and task-aware motion planning into a unified framework. Unlike monolithic or dataset-bound systems, GAM embraces decoupled service-oriented modules, including Grounded-Segment-Anything (GSAM), Multiview Compressive Coding (MCC), and MoveIt Task Constructor (MTC)— to ensure component reusability, architectural transparency, and diagnostic granularity. The system is evaluated across both simulated and real-world settings using a representative set of objects varying in geometry, size, and occlusion. Empirical results demonstrate GAM’s ability to generalize grasp strategies without object specific retraining, achieve stable motion execution in partially observed scenes, and operate under CPU-only constraints with total runtime under one minute per grasp. Planning latency is significantly reduced via MTC’s structured action pipeline, while grasp success rates exceeded 70% on average across diverse object categories. Nonetheless, limitations in sensor quality, segmentation under ambiguity, and grasp planning near constrained surfaces highlight opportunities for further optimization. By prioritizing modularity over end-to-end speed, this work contributes a foundational manipulation framework suitable for deployment, extension, and research in real-world robotic systems. GAM is not presented as a final grasping solution, but as a robust platform upon which future advances in data-driven manipulation, adaptive behaviour logic, and system level autonomy can be built.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				A foundation model for robotic manipulation: GAM, Grasp-Anything-Model
			
	Abstract in italiano
	
				This thesis introduces the Grasp-Anything-Model (GAM), a modular, interpretable, and ROS2-native grasping pipeline designed to enable zero-shot manipulation of arbitrary objects in unstructured environments. Built for the Tiago family of service robots developed by PAL Robotics, GAM integrates prompt-grounded perception, single-view, 3Dreconstruction, grasp pose generation, and task-aware motion planning into a unified framework. Unlike monolithic or dataset-bound systems, GAM embraces decoupled service-oriented modules, including Grounded-Segment-Anything (GSAM), Multiview Compressive Coding (MCC), and MoveIt Task  Constructor (MTC)— to ensure component reusability, architectural transparency, and diagnostic granularity.
The system is evaluated across both simulated and real-world settings using a representative set of objects varying in geometry, size, and occlusion. Empirical results demonstrate GAM’s ability to generalize grasp strategies without object
specific retraining, achieve stable motion execution in partially observed scenes, and operate under CPU-only constraints with total runtime under one minute per grasp. Planning latency is significantly reduced via MTC’s structured action pipeline, while grasp success rates exceeded 70% on average across diverse object categories. Nonetheless, limitations in sensor quality, segmentation under ambiguity, and grasp planning near constrained surfaces highlight opportunities for further optimization.
 By prioritizing modularity over end-to-end speed, this work contributes a foundational manipulation framework suitable for deployment, extension, and research in real-world robotic systems. GAM is not presented as a final grasping solution, but as a robust platform upon which future advances in data-driven manipulation, adaptive behaviour logic, and system level autonomy can be built.
			
	Parola chiave
	
				Grasping
Manipulation
Robotics
Tiago
ROS2
			
	Relatore
	
				MENEGATTI, EMANUELE
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Villani_Matteo.pdf embargo fino al 07/07/2026 Dimensione 4.67 MB Formato Adobe PDF	4.67 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/86956