6DoF Pose Estimation for Robotic Manipulation using Synthetic Data and Stereo Vision

3D object pose estimation plays a crucial role in many applications such as computer vision, object tracking and robotic manipulation. It has gained increasing importance in 3D vision over the past decade. Most of the approaches nowadays are based on Deep Learning and they often require large datasets of labeled images that are expensive to maintain. This thesis project proposes a model-based computer vision approach that overcomes the problem of creating such datasets by leveraging synthetic data generated directly from CAD models. The pipeline is composed of two main phases: the offline phase is characterized by a virtual environment to simulate the camera parameters and generate a comprehensive dataset of synthetic views which plays a crucial roles to accurate mapping 2D features with the corresponding 3D vertex in object coordinate space. Subsequently, depending on the surface texture of the target object, the online phase processes the stereo images to extract either geometric edges or local keypoints. The synthetic and online information are then matched to establish a robust and reliable 2D-3D correspondences, enabling the Perspective-n-Point algorithm to estimate the 6DoF pose. The entire system is implemented in .NET environment and it demonstrates the feasibility of relying exclusively on synthetic data to drive the process of localizing objects in real world scenarios.

6DoF Pose Estimation for Robotic Manipulation using Synthetic Data and Stereo Vision

PILOTTO, FEDERICO

2025/2026

Abstract

3D object pose estimation plays a crucial role in many applications such as computer vision, object tracking and robotic manipulation. It has gained increasing importance in 3D vision over the past decade. Most of the approaches nowadays are based on Deep Learning and they often require large datasets of labeled images that are expensive to maintain. This thesis project proposes a model-based computer vision approach that overcomes the problem of creating such datasets by leveraging synthetic data generated directly from CAD models. The pipeline is composed of two main phases: the offline phase is characterized by a virtual environment to simulate the camera parameters and generate a comprehensive dataset of synthetic views which plays a crucial roles to accurate mapping 2D features with the corresponding 3D vertex in object coordinate space. Subsequently, depending on the surface texture of the target object, the online phase processes the stereo images to extract either geometric edges or local keypoints. The synthetic and online information are then matched to establish a robust and reliable 2D-3D correspondences, enabling the Perspective-n-Point algorithm to estimate the 6DoF pose. The entire system is implemented in .NET environment and it demonstrates the feasibility of relying exclusively on synthetic data to drive the process of localizing objects in real world scenarios.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				ICT FOR INTERNET AND MULTIMEDIA - INGEGNERIA PER LE COMUNICAZIONI MULTIMEDIALI E INTERNET Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2025
			
	Titolo inglese
	
				6DoF Pose Estimation for Robotic Manipulation using Synthetic Data and Stereo Vision
			
	Abstract in italiano
	
				3D object pose estimation plays a crucial role in many applications such as computer
vision, object tracking and robotic manipulation. It has gained increasing
importance in 3D vision over the past decade. Most of the approaches nowadays
are based on Deep Learning and they often require large datasets of labeled images
that are expensive to maintain. This thesis project proposes a model-based
computer vision approach that overcomes the problem of creating such datasets
by leveraging synthetic data generated directly from CAD models. The pipeline
is composed of two main phases: the offline phase is characterized by a virtual
environment to simulate the camera parameters and generate a comprehensive
dataset of synthetic views which plays a crucial roles to accurate mapping 2D
features with the corresponding 3D vertex in object coordinate space. Subsequently,
depending on the surface texture of the target object, the online phase
processes the stereo images to extract either geometric edges or local keypoints.
The synthetic and online information are then matched to establish a robust and
reliable 2D-3D correspondences, enabling the Perspective-n-Point algorithm to
estimate the 6DoF pose. The entire system is implemented in .NET environment
and it demonstrates the feasibility of relying exclusively on synthetic data
to drive the process of localizing objects in real world scenarios.
			
	Parola chiave
	
				Pose Estimation
Robotic Manipulation
Stereo Vision
			
	Relatore
	
				ZANUTTIGH, PIETRO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Pilotto_Federico.pdf accesso aperto Dimensione 3.84 MB Formato Adobe PDF Visualizza/Apri	3.84 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/106490