This thesis introduces the Grasp-Anything-Model (GAM), a modular, interpretable, and ROS2-native grasping pipeline designed to enable zero-shot manipulation of arbitrary objects in unstructured environments. Built for the Tiago family of service robots developed by PAL Robotics, GAM integrates prompt-grounded perception, single-view, 3Dreconstruction, grasp pose generation, and task-aware motion planning into a unified framework. Unlike monolithic or dataset-bound systems, GAM embraces decoupled service-oriented modules, including Grounded-Segment-Anything (GSAM), Multiview Compressive Coding (MCC), and MoveIt Task Constructor (MTC)— to ensure component reusability, architectural transparency, and diagnostic granularity. The system is evaluated across both simulated and real-world settings using a representative set of objects varying in geometry, size, and occlusion. Empirical results demonstrate GAM’s ability to generalize grasp strategies without object specific retraining, achieve stable motion execution in partially observed scenes, and operate under CPU-only constraints with total runtime under one minute per grasp. Planning latency is significantly reduced via MTC’s structured action pipeline, while grasp success rates exceeded 70% on average across diverse object categories. Nonetheless, limitations in sensor quality, segmentation under ambiguity, and grasp planning near constrained surfaces highlight opportunities for further optimization. By prioritizing modularity over end-to-end speed, this work contributes a foundational manipulation framework suitable for deployment, extension, and research in real-world robotic systems. GAM is not presented as a final grasping solution, but as a robust platform upon which future advances in data-driven manipulation, adaptive behaviour logic, and system level autonomy can be built.
This thesis introduces the Grasp-Anything-Model (GAM), a modular, interpretable, and ROS2-native grasping pipeline designed to enable zero-shot manipulation of arbitrary objects in unstructured environments. Built for the Tiago family of service robots developed by PAL Robotics, GAM integrates prompt-grounded perception, single-view, 3Dreconstruction, grasp pose generation, and task-aware motion planning into a unified framework. Unlike monolithic or dataset-bound systems, GAM embraces decoupled service-oriented modules, including Grounded-Segment-Anything (GSAM), Multiview Compressive Coding (MCC), and MoveIt Task Constructor (MTC)— to ensure component reusability, architectural transparency, and diagnostic granularity. The system is evaluated across both simulated and real-world settings using a representative set of objects varying in geometry, size, and occlusion. Empirical results demonstrate GAM’s ability to generalize grasp strategies without object specific retraining, achieve stable motion execution in partially observed scenes, and operate under CPU-only constraints with total runtime under one minute per grasp. Planning latency is significantly reduced via MTC’s structured action pipeline, while grasp success rates exceeded 70% on average across diverse object categories. Nonetheless, limitations in sensor quality, segmentation under ambiguity, and grasp planning near constrained surfaces highlight opportunities for further optimization. By prioritizing modularity over end-to-end speed, this work contributes a foundational manipulation framework suitable for deployment, extension, and research in real-world robotic systems. GAM is not presented as a final grasping solution, but as a robust platform upon which future advances in data-driven manipulation, adaptive behaviour logic, and system level autonomy can be built.
A foundation model for robotic manipulation: GAM, Grasp-Anything-Model
VILLANI, MATTEO
2024/2025
Abstract
This thesis introduces the Grasp-Anything-Model (GAM), a modular, interpretable, and ROS2-native grasping pipeline designed to enable zero-shot manipulation of arbitrary objects in unstructured environments. Built for the Tiago family of service robots developed by PAL Robotics, GAM integrates prompt-grounded perception, single-view, 3Dreconstruction, grasp pose generation, and task-aware motion planning into a unified framework. Unlike monolithic or dataset-bound systems, GAM embraces decoupled service-oriented modules, including Grounded-Segment-Anything (GSAM), Multiview Compressive Coding (MCC), and MoveIt Task Constructor (MTC)— to ensure component reusability, architectural transparency, and diagnostic granularity. The system is evaluated across both simulated and real-world settings using a representative set of objects varying in geometry, size, and occlusion. Empirical results demonstrate GAM’s ability to generalize grasp strategies without object specific retraining, achieve stable motion execution in partially observed scenes, and operate under CPU-only constraints with total runtime under one minute per grasp. Planning latency is significantly reduced via MTC’s structured action pipeline, while grasp success rates exceeded 70% on average across diverse object categories. Nonetheless, limitations in sensor quality, segmentation under ambiguity, and grasp planning near constrained surfaces highlight opportunities for further optimization. By prioritizing modularity over end-to-end speed, this work contributes a foundational manipulation framework suitable for deployment, extension, and research in real-world robotic systems. GAM is not presented as a final grasping solution, but as a robust platform upon which future advances in data-driven manipulation, adaptive behaviour logic, and system level autonomy can be built.| File | Dimensione | Formato | |
|---|---|---|---|
|
Villani_Matteo.pdf
embargo fino al 07/07/2026
Dimensione
4.67 MB
Formato
Adobe PDF
|
4.67 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/86956