3D object pose estimation plays a crucial role in many applications such as computer vision, object tracking and robotic manipulation. It has gained increasing importance in 3D vision over the past decade. Most of the approaches nowadays are based on Deep Learning and they often require large datasets of labeled images that are expensive to maintain. This thesis project proposes a model-based computer vision approach that overcomes the problem of creating such datasets by leveraging synthetic data generated directly from CAD models. The pipeline is composed of two main phases: the offline phase is characterized by a virtual environment to simulate the camera parameters and generate a comprehensive dataset of synthetic views which plays a crucial roles to accurate mapping 2D features with the corresponding 3D vertex in object coordinate space. Subsequently, depending on the surface texture of the target object, the online phase processes the stereo images to extract either geometric edges or local keypoints. The synthetic and online information are then matched to establish a robust and reliable 2D-3D correspondences, enabling the Perspective-n-Point algorithm to estimate the 6DoF pose. The entire system is implemented in .NET environment and it demonstrates the feasibility of relying exclusively on synthetic data to drive the process of localizing objects in real world scenarios.

3D object pose estimation plays a crucial role in many applications such as computer vision, object tracking and robotic manipulation. It has gained increasing importance in 3D vision over the past decade. Most of the approaches nowadays are based on Deep Learning and they often require large datasets of labeled images that are expensive to maintain. This thesis project proposes a model-based computer vision approach that overcomes the problem of creating such datasets by leveraging synthetic data generated directly from CAD models. The pipeline is composed of two main phases: the offline phase is characterized by a virtual environment to simulate the camera parameters and generate a comprehensive dataset of synthetic views which plays a crucial roles to accurate mapping 2D features with the corresponding 3D vertex in object coordinate space. Subsequently, depending on the surface texture of the target object, the online phase processes the stereo images to extract either geometric edges or local keypoints. The synthetic and online information are then matched to establish a robust and reliable 2D-3D correspondences, enabling the Perspective-n-Point algorithm to estimate the 6DoF pose. The entire system is implemented in .NET environment and it demonstrates the feasibility of relying exclusively on synthetic data to drive the process of localizing objects in real world scenarios.

6DoF Pose Estimation for Robotic Manipulation using Synthetic Data and Stereo Vision

PILOTTO, FEDERICO
2025/2026

Abstract

3D object pose estimation plays a crucial role in many applications such as computer vision, object tracking and robotic manipulation. It has gained increasing importance in 3D vision over the past decade. Most of the approaches nowadays are based on Deep Learning and they often require large datasets of labeled images that are expensive to maintain. This thesis project proposes a model-based computer vision approach that overcomes the problem of creating such datasets by leveraging synthetic data generated directly from CAD models. The pipeline is composed of two main phases: the offline phase is characterized by a virtual environment to simulate the camera parameters and generate a comprehensive dataset of synthetic views which plays a crucial roles to accurate mapping 2D features with the corresponding 3D vertex in object coordinate space. Subsequently, depending on the surface texture of the target object, the online phase processes the stereo images to extract either geometric edges or local keypoints. The synthetic and online information are then matched to establish a robust and reliable 2D-3D correspondences, enabling the Perspective-n-Point algorithm to estimate the 6DoF pose. The entire system is implemented in .NET environment and it demonstrates the feasibility of relying exclusively on synthetic data to drive the process of localizing objects in real world scenarios.
2025
6DoF Pose Estimation for Robotic Manipulation using Synthetic Data and Stereo Vision
3D object pose estimation plays a crucial role in many applications such as computer vision, object tracking and robotic manipulation. It has gained increasing importance in 3D vision over the past decade. Most of the approaches nowadays are based on Deep Learning and they often require large datasets of labeled images that are expensive to maintain. This thesis project proposes a model-based computer vision approach that overcomes the problem of creating such datasets by leveraging synthetic data generated directly from CAD models. The pipeline is composed of two main phases: the offline phase is characterized by a virtual environment to simulate the camera parameters and generate a comprehensive dataset of synthetic views which plays a crucial roles to accurate mapping 2D features with the corresponding 3D vertex in object coordinate space. Subsequently, depending on the surface texture of the target object, the online phase processes the stereo images to extract either geometric edges or local keypoints. The synthetic and online information are then matched to establish a robust and reliable 2D-3D correspondences, enabling the Perspective-n-Point algorithm to estimate the 6DoF pose. The entire system is implemented in .NET environment and it demonstrates the feasibility of relying exclusively on synthetic data to drive the process of localizing objects in real world scenarios.
Pose Estimation
Robotic Manipulation
Stereo Vision
File in questo prodotto:
File Dimensione Formato  
Pilotto_Federico.pdf

accesso aperto

Dimensione 3.84 MB
Formato Adobe PDF
3.84 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/106490