RGB-based monocular 6D pose estimation: selecting the best methods for real industrial scenarios

The 6D pose estimation is a critical and widely researched problem in computer vision, with applications in augmented reality, robotic manipulation, industrial automation, and autonomous driving. It involves determining an object’s position and orientation in three-dimensional space, enabling precise interaction with physical objects and enhancing spatial awareness. This foundational capability allows systems to perform tasks such as guiding robotic arms, detecting obstacles for autonomous vehicles, and integrating virtual objects into real environments for augmented reality applications. Despite significant advancements, several challenges persist in this field. A major issue is the diversity of object categories and the variety of environments where objects appear. Many learning-based methods are limited by the lack of large-scale, high-quality labeled datasets necessary for training robust models. Additionally, training state-of-the-art deep learning models for 6D pose estimation requires substantial computational resources, including powerful GPUs and large memory capacities, which are not always accessible or affordable. Moreover, reducing reliance on expensive, specialized sensors often used to enhance pose estimation accuracy is another key area of ongoing research. This thesis explores more affordable, sensor-independent solutions for 6D pose estimation. Specifically, it focuses on state-of-the-art deep learning models that use monocular RGB images, aiming to identify lightweight, cost-effective methods. By analyzing the performance of leading RGB-based models in terms of accuracy, efficiency, and real-world applicability, this study seeks to evaluate their potential for deployment in practical applications. The research will compare two selected models through accuracy experiments on a set of industrial objects. The model demonstrating superior performance will be deployed in a real-world straightforward pick and place scenario to assess its precision, bridging the gap between academic research and real-world implementation.

RGB-based monocular 6D pose estimation: selecting the best methods for real industrial scenarios

SCHIAVO, CARLOTTA

2023/2024

Abstract

The 6D pose estimation is a critical and widely researched problem in computer vision, with applications in augmented reality, robotic manipulation, industrial automation, and autonomous driving. It involves determining an object’s position and orientation in three-dimensional space, enabling precise interaction with physical objects and enhancing spatial awareness. This foundational capability allows systems to perform tasks such as guiding robotic arms, detecting obstacles for autonomous vehicles, and integrating virtual objects into real environments for augmented reality applications. Despite significant advancements, several challenges persist in this field. A major issue is the diversity of object categories and the variety of environments where objects appear. Many learning-based methods are limited by the lack of large-scale, high-quality labeled datasets necessary for training robust models. Additionally, training state-of-the-art deep learning models for 6D pose estimation requires substantial computational resources, including powerful GPUs and large memory capacities, which are not always accessible or affordable. Moreover, reducing reliance on expensive, specialized sensors often used to enhance pose estimation accuracy is another key area of ongoing research. This thesis explores more affordable, sensor-independent solutions for 6D pose estimation. Specifically, it focuses on state-of-the-art deep learning models that use monocular RGB images, aiming to identify lightweight, cost-effective methods. By analyzing the performance of leading RGB-based models in terms of accuracy, efficiency, and real-world applicability, this study seeks to evaluate their potential for deployment in practical applications. The research will compare two selected models through accuracy experiments on a set of industrial objects. The model demonstrating superior performance will be deployed in a real-world straightforward pick and place scenario to assess its precision, bridging the gap between academic research and real-world implementation.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				RGB-based monocular 6D pose estimation: selecting the best methods for real industrial scenarios
			
	Abstract in italiano
	
				The 6D pose estimation is a critical and widely researched problem in computer
vision, with applications in augmented reality, robotic manipulation, industrial automation, and autonomous driving. It involves determining an object’s position and orientation in three-dimensional space, enabling precise interaction with physical objects and enhancing spatial awareness. This foundational capability allows systems to perform tasks such as guiding robotic arms, detecting obstacles for autonomous vehicles, and integrating virtual objects into real environments for augmented reality applications. Despite significant advancements, several challenges persist in this field. A major issue is the diversity of object categories and the variety of environments where objects appear. Many learning-based methods are limited by the lack of large-scale, high-quality labeled datasets necessary for training robust models. Additionally, training state-of-the-art deep learning models for 6D pose estimation requires substantial computational resources, including powerful GPUs and large memory capacities, which are not always accessible or affordable. Moreover, reducing reliance on expensive, specialized sensors often used to enhance pose estimation accuracy is another key area of ongoing research. This thesis explores more affordable, sensor-independent solutions for 6D pose estimation. Specifically, it focuses on
state-of-the-art deep learning models that use monocular RGB images, aiming to identify lightweight, cost-effective methods. By analyzing the performance of
leading RGB-based models in terms of accuracy, efficiency, and real-world applicability, this study seeks to evaluate their potential for deployment in practical applications. The research will compare two selected models through accuracy experiments on a set of industrial objects. The model demonstrating superior performance will be deployed in a real-world straightforward pick and place scenario to assess its precision, bridging the gap between academic research and real-world implementation.
			
	Parola chiave
	
				6D pose estimation
3D object detection
RGB
CAD
Industrial scenario
			
	Relatore
	
				GHIDONI, STEFANO
			
	Correlatore
	
				TERRERAN, MATTEO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Schiavo_Carlotta.pdf accesso riservato Dimensione 17.25 MB Formato Adobe PDF	17.25 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/80173