Object pose estimation is a significant challenge in computer vision, involving determining the position and orientation of an object within an image. This task requires a large training dataset representing various poses of the object. However, collecting such data can be costly and labor-intensive. This thesis examines the architecture of Generative Adversarial Networks (GANs), which can generate realistic images from random noise, and aims to study various GAN architectures and apply them to generate images that can be used in object pose estimation. Throughout the experiments, pre-rendered images of the Cat's model from LINEMOD [1] will serve as input to the considered GAN models, aiming to reproduce realistic images containing the reference Cat in the same predefined pose. First, a Domain Adaptation (DA) approach has been used to generate underwater images based on an unpaired dataset. This experiment served as a proof-of-concept to illustrate the process of generating images in environments where data collection is notably challenging. Next, the focus has been shifted to generating new samples from the LINEMOD's dataset. Here, the objective is to capture features and spatial relationships between objects from the real images and use them to generate new pictures coherent with the input Cat's model. Finally, the best GAN models have been combined with Pixel-wise Voting Network (PVNet [2]) to compare their performance against the PVNet model trained using the conventional image Superimposing Method (PVNet-SM). The main conclusion of this work is that GANs can achieve similar performance as PVNet-SM while eliminating the need for manual work in background selection and dataset preparation. [1] S. Hinterstoisser, V. Lepetit, S. Ilic, et al., "Model-based training, detection, and pose estimation of texture-less 3D objects in heavily cluttered scenes," in Computer Vision ACCV 2012, K. M. Lee, Y. Matsushita, J. M. Rehg, and Z. Hu, Eds., vol. 7724, Springer Berlin Heidelberg, 2012, pp. 548–562. [2] S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao, "PVNet: Pixel-wise voting network for 6-DOF pose estimation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

Object pose estimation is a significant challenge in computer vision, involving determining the position and orientation of an object within an image. This task requires a large training dataset representing various poses of the object. However, collecting such data can be costly and labor-intensive. This thesis examines the architecture of Generative Adversarial Networks (GANs), which can generate realistic images from random noise, and aims to study various GAN architectures and apply them to generate images that can be used in object pose estimation. Throughout the experiments, pre-rendered images of the Cat's model from LINEMOD [1] will serve as input to the considered GAN models, aiming to reproduce realistic images containing the reference Cat in the same predefined pose. First, a Domain Adaptation (DA) approach has been used to generate underwater images based on an unpaired dataset. This experiment served as a proof-of-concept to illustrate the process of generating images in environments where data collection is notably challenging. Next, the focus has been shifted to generating new samples from the LINEMOD's dataset. Here, the objective is to capture features and spatial relationships between objects from the real images and use them to generate new pictures coherent with the input Cat's model. Finally, the best GAN models have been combined with Pixel-wise Voting Network (PVNet [2]) to compare their performance against the PVNet model trained using the conventional image Superimposing Method (PVNet-SM). The main conclusion of this work is that GANs can achieve similar performance as PVNet-SM while eliminating the need for manual work in background selection and dataset preparation. [1] S. Hinterstoisser, V. Lepetit, S. Ilic, et al., "Model-based training, detection, and pose estimation of texture-less 3D objects in heavily cluttered scenes," in Computer Vision ACCV 2012, K. M. Lee, Y. Matsushita, J. M. Rehg, and Z. Hu, Eds., vol. 7724, Springer Berlin Heidelberg, 2012, pp. 548–562. [2] S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao, "PVNet: Pixel-wise voting network for 6-DOF pose estimation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

Utilizzo di reti GAN per la creazione di immagini di addestramento per l'object pose estimation

DEL BEN, ROBERTO
2022/2023

Abstract

Object pose estimation is a significant challenge in computer vision, involving determining the position and orientation of an object within an image. This task requires a large training dataset representing various poses of the object. However, collecting such data can be costly and labor-intensive. This thesis examines the architecture of Generative Adversarial Networks (GANs), which can generate realistic images from random noise, and aims to study various GAN architectures and apply them to generate images that can be used in object pose estimation. Throughout the experiments, pre-rendered images of the Cat's model from LINEMOD [1] will serve as input to the considered GAN models, aiming to reproduce realistic images containing the reference Cat in the same predefined pose. First, a Domain Adaptation (DA) approach has been used to generate underwater images based on an unpaired dataset. This experiment served as a proof-of-concept to illustrate the process of generating images in environments where data collection is notably challenging. Next, the focus has been shifted to generating new samples from the LINEMOD's dataset. Here, the objective is to capture features and spatial relationships between objects from the real images and use them to generate new pictures coherent with the input Cat's model. Finally, the best GAN models have been combined with Pixel-wise Voting Network (PVNet [2]) to compare their performance against the PVNet model trained using the conventional image Superimposing Method (PVNet-SM). The main conclusion of this work is that GANs can achieve similar performance as PVNet-SM while eliminating the need for manual work in background selection and dataset preparation. [1] S. Hinterstoisser, V. Lepetit, S. Ilic, et al., "Model-based training, detection, and pose estimation of texture-less 3D objects in heavily cluttered scenes," in Computer Vision ACCV 2012, K. M. Lee, Y. Matsushita, J. M. Rehg, and Z. Hu, Eds., vol. 7724, Springer Berlin Heidelberg, 2012, pp. 548–562. [2] S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao, "PVNet: Pixel-wise voting network for 6-DOF pose estimation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
2022
Using GANs to Create Training Images for Object Pose Estimation
Object pose estimation is a significant challenge in computer vision, involving determining the position and orientation of an object within an image. This task requires a large training dataset representing various poses of the object. However, collecting such data can be costly and labor-intensive. This thesis examines the architecture of Generative Adversarial Networks (GANs), which can generate realistic images from random noise, and aims to study various GAN architectures and apply them to generate images that can be used in object pose estimation. Throughout the experiments, pre-rendered images of the Cat's model from LINEMOD [1] will serve as input to the considered GAN models, aiming to reproduce realistic images containing the reference Cat in the same predefined pose. First, a Domain Adaptation (DA) approach has been used to generate underwater images based on an unpaired dataset. This experiment served as a proof-of-concept to illustrate the process of generating images in environments where data collection is notably challenging. Next, the focus has been shifted to generating new samples from the LINEMOD's dataset. Here, the objective is to capture features and spatial relationships between objects from the real images and use them to generate new pictures coherent with the input Cat's model. Finally, the best GAN models have been combined with Pixel-wise Voting Network (PVNet [2]) to compare their performance against the PVNet model trained using the conventional image Superimposing Method (PVNet-SM). The main conclusion of this work is that GANs can achieve similar performance as PVNet-SM while eliminating the need for manual work in background selection and dataset preparation. [1] S. Hinterstoisser, V. Lepetit, S. Ilic, et al., "Model-based training, detection, and pose estimation of texture-less 3D objects in heavily cluttered scenes," in Computer Vision ACCV 2012, K. M. Lee, Y. Matsushita, J. M. Rehg, and Z. Hu, Eds., vol. 7724, Springer Berlin Heidelberg, 2012, pp. 548–562. [2] S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao, "PVNet: Pixel-wise voting network for 6-DOF pose estimation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
Pose Estimation
GAN
Synthetic dataset
File in questo prodotto:
File Dimensione Formato  
DelBen_Roberto.pdf

accesso aperto

Dimensione 10.87 MB
Formato Adobe PDF
10.87 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/54924