The 6D pose estimation is a critical and widely researched problem in computer vision, with applications in augmented reality, robotic manipulation, industrial automation, and autonomous driving. It involves determining an object’s position and orientation in three-dimensional space, enabling precise interaction with physical objects and enhancing spatial awareness. This foundational capability allows systems to perform tasks such as guiding robotic arms, detecting obstacles for autonomous vehicles, and integrating virtual objects into real environments for augmented reality applications. Despite significant advancements, several challenges persist in this field. A major issue is the diversity of object categories and the variety of environments where objects appear. Many learning-based methods are limited by the lack of large-scale, high-quality labeled datasets necessary for training robust models. Additionally, training state-of-the-art deep learning models for 6D pose estimation requires substantial computational resources, including powerful GPUs and large memory capacities, which are not always accessible or affordable. Moreover, reducing reliance on expensive, specialized sensors often used to enhance pose estimation accuracy is another key area of ongoing research. This thesis explores more affordable, sensor-independent solutions for 6D pose estimation. Specifically, it focuses on state-of-the-art deep learning models that use monocular RGB images, aiming to identify lightweight, cost-effective methods. By analyzing the performance of leading RGB-based models in terms of accuracy, efficiency, and real-world applicability, this study seeks to evaluate their potential for deployment in practical applications. The research will compare two selected models through accuracy experiments on a set of industrial objects. The model demonstrating superior performance will be deployed in a real-world straightforward pick and place scenario to assess its precision, bridging the gap between academic research and real-world implementation.

The 6D pose estimation is a critical and widely researched problem in computer vision, with applications in augmented reality, robotic manipulation, industrial automation, and autonomous driving. It involves determining an object’s position and orientation in three-dimensional space, enabling precise interaction with physical objects and enhancing spatial awareness. This foundational capability allows systems to perform tasks such as guiding robotic arms, detecting obstacles for autonomous vehicles, and integrating virtual objects into real environments for augmented reality applications. Despite significant advancements, several challenges persist in this field. A major issue is the diversity of object categories and the variety of environments where objects appear. Many learning-based methods are limited by the lack of large-scale, high-quality labeled datasets necessary for training robust models. Additionally, training state-of-the-art deep learning models for 6D pose estimation requires substantial computational resources, including powerful GPUs and large memory capacities, which are not always accessible or affordable. Moreover, reducing reliance on expensive, specialized sensors often used to enhance pose estimation accuracy is another key area of ongoing research. This thesis explores more affordable, sensor-independent solutions for 6D pose estimation. Specifically, it focuses on state-of-the-art deep learning models that use monocular RGB images, aiming to identify lightweight, cost-effective methods. By analyzing the performance of leading RGB-based models in terms of accuracy, efficiency, and real-world applicability, this study seeks to evaluate their potential for deployment in practical applications. The research will compare two selected models through accuracy experiments on a set of industrial objects. The model demonstrating superior performance will be deployed in a real-world straightforward pick and place scenario to assess its precision, bridging the gap between academic research and real-world implementation.

RGB-based monocular 6D pose estimation: selecting the best methods for real industrial scenarios

SCHIAVO, CARLOTTA
2023/2024

Abstract

The 6D pose estimation is a critical and widely researched problem in computer vision, with applications in augmented reality, robotic manipulation, industrial automation, and autonomous driving. It involves determining an object’s position and orientation in three-dimensional space, enabling precise interaction with physical objects and enhancing spatial awareness. This foundational capability allows systems to perform tasks such as guiding robotic arms, detecting obstacles for autonomous vehicles, and integrating virtual objects into real environments for augmented reality applications. Despite significant advancements, several challenges persist in this field. A major issue is the diversity of object categories and the variety of environments where objects appear. Many learning-based methods are limited by the lack of large-scale, high-quality labeled datasets necessary for training robust models. Additionally, training state-of-the-art deep learning models for 6D pose estimation requires substantial computational resources, including powerful GPUs and large memory capacities, which are not always accessible or affordable. Moreover, reducing reliance on expensive, specialized sensors often used to enhance pose estimation accuracy is another key area of ongoing research. This thesis explores more affordable, sensor-independent solutions for 6D pose estimation. Specifically, it focuses on state-of-the-art deep learning models that use monocular RGB images, aiming to identify lightweight, cost-effective methods. By analyzing the performance of leading RGB-based models in terms of accuracy, efficiency, and real-world applicability, this study seeks to evaluate their potential for deployment in practical applications. The research will compare two selected models through accuracy experiments on a set of industrial objects. The model demonstrating superior performance will be deployed in a real-world straightforward pick and place scenario to assess its precision, bridging the gap between academic research and real-world implementation.
2023
RGB-based monocular 6D pose estimation: selecting the best methods for real industrial scenarios
The 6D pose estimation is a critical and widely researched problem in computer vision, with applications in augmented reality, robotic manipulation, industrial automation, and autonomous driving. It involves determining an object’s position and orientation in three-dimensional space, enabling precise interaction with physical objects and enhancing spatial awareness. This foundational capability allows systems to perform tasks such as guiding robotic arms, detecting obstacles for autonomous vehicles, and integrating virtual objects into real environments for augmented reality applications. Despite significant advancements, several challenges persist in this field. A major issue is the diversity of object categories and the variety of environments where objects appear. Many learning-based methods are limited by the lack of large-scale, high-quality labeled datasets necessary for training robust models. Additionally, training state-of-the-art deep learning models for 6D pose estimation requires substantial computational resources, including powerful GPUs and large memory capacities, which are not always accessible or affordable. Moreover, reducing reliance on expensive, specialized sensors often used to enhance pose estimation accuracy is another key area of ongoing research. This thesis explores more affordable, sensor-independent solutions for 6D pose estimation. Specifically, it focuses on state-of-the-art deep learning models that use monocular RGB images, aiming to identify lightweight, cost-effective methods. By analyzing the performance of leading RGB-based models in terms of accuracy, efficiency, and real-world applicability, this study seeks to evaluate their potential for deployment in practical applications. The research will compare two selected models through accuracy experiments on a set of industrial objects. The model demonstrating superior performance will be deployed in a real-world straightforward pick and place scenario to assess its precision, bridging the gap between academic research and real-world implementation.
6D pose estimation
3D object detection
RGB
CAD
Industrial scenario
File in questo prodotto:
File Dimensione Formato  
Schiavo_Carlotta.pdf

accesso riservato

Dimensione 17.25 MB
Formato Adobe PDF
17.25 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/80173