Utilizzo di reti GAN per la creazione di immagini di addestramento per l'object pose estimation

Object pose estimation is a significant challenge in computer vision, involving determining the position and orientation of an object within an image. This task requires a large training dataset representing various poses of the object. However, collecting such data can be costly and labor-intensive. This thesis examines the architecture of Generative Adversarial Networks (GANs), which can generate realistic images from random noise, and aims to study various GAN architectures and apply them to generate images that can be used in object pose estimation. Throughout the experiments, pre-rendered images of the Cat's model from LINEMOD [1] will serve as input to the considered GAN models, aiming to reproduce realistic images containing the reference Cat in the same predefined pose. First, a Domain Adaptation (DA) approach has been used to generate underwater images based on an unpaired dataset. This experiment served as a proof-of-concept to illustrate the process of generating images in environments where data collection is notably challenging. Next, the focus has been shifted to generating new samples from the LINEMOD's dataset. Here, the objective is to capture features and spatial relationships between objects from the real images and use them to generate new pictures coherent with the input Cat's model. Finally, the best GAN models have been combined with Pixel-wise Voting Network (PVNet [2]) to compare their performance against the PVNet model trained using the conventional image Superimposing Method (PVNet-SM). The main conclusion of this work is that GANs can achieve similar performance as PVNet-SM while eliminating the need for manual work in background selection and dataset preparation. [1] S. Hinterstoisser, V. Lepetit, S. Ilic, et al., "Model-based training, detection, and pose estimation of texture-less 3D objects in heavily cluttered scenes," in Computer Vision ACCV 2012, K. M. Lee, Y. Matsushita, J. M. Rehg, and Z. Hu, Eds., vol. 7724, Springer Berlin Heidelberg, 2012, pp. 548–562. [2] S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao, "PVNet: Pixel-wise voting network for 6-DOF pose estimation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

Utilizzo di reti GAN per la creazione di immagini di addestramento per l'object pose estimation

DEL BEN, ROBERTO

2022/2023

Abstract

Object pose estimation is a significant challenge in computer vision, involving determining the position and orientation of an object within an image. This task requires a large training dataset representing various poses of the object. However, collecting such data can be costly and labor-intensive. This thesis examines the architecture of Generative Adversarial Networks (GANs), which can generate realistic images from random noise, and aims to study various GAN architectures and apply them to generate images that can be used in object pose estimation. Throughout the experiments, pre-rendered images of the Cat's model from LINEMOD [1] will serve as input to the considered GAN models, aiming to reproduce realistic images containing the reference Cat in the same predefined pose. First, a Domain Adaptation (DA) approach has been used to generate underwater images based on an unpaired dataset. This experiment served as a proof-of-concept to illustrate the process of generating images in environments where data collection is notably challenging. Next, the focus has been shifted to generating new samples from the LINEMOD's dataset. Here, the objective is to capture features and spatial relationships between objects from the real images and use them to generate new pictures coherent with the input Cat's model. Finally, the best GAN models have been combined with Pixel-wise Voting Network (PVNet [2]) to compare their performance against the PVNet model trained using the conventional image Superimposing Method (PVNet-SM). The main conclusion of this work is that GANs can achieve similar performance as PVNet-SM while eliminating the need for manual work in background selection and dataset preparation. [1] S. Hinterstoisser, V. Lepetit, S. Ilic, et al., "Model-based training, detection, and pose estimation of texture-less 3D objects in heavily cluttered scenes," in Computer Vision ACCV 2012, K. M. Lee, Y. Matsushita, J. M. Rehg, and Z. Hu, Eds., vol. 7724, Springer Berlin Heidelberg, 2012, pp. 548–562. [2] S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao, "PVNet: Pixel-wise voting network for 6-DOF pose estimation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				INGEGNERIA DELL'AUTOMAZIONE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2022
			
	Titolo inglese
	
				Using GANs to Create Training Images for Object Pose Estimation
			
	Abstract in italiano
	
				Object pose estimation is a significant challenge in computer vision, involving determining the position and orientation of an object within an image. This task requires a large training dataset representing various poses of the object. However, collecting such data can be costly and labor-intensive. This thesis examines the architecture of Generative Adversarial Networks (GANs), which can generate realistic images from random noise, and aims to study various GAN architectures and apply them to generate images that can be used in object pose estimation.

Throughout the experiments, pre-rendered images of the Cat's model from LINEMOD [1] will serve as input to the considered GAN models, aiming to reproduce realistic images containing the reference Cat in the same predefined pose.

First, a Domain Adaptation (DA) approach has been used to generate underwater images based on an unpaired dataset. This experiment served as a proof-of-concept to illustrate the process of generating images in environments where data collection is notably challenging. Next, the focus has been shifted to generating new samples from the LINEMOD's dataset. Here, the objective is to capture features and spatial relationships between objects from the real images and use them to generate new pictures coherent with the input Cat's model. Finally, the best GAN models have been combined with Pixel-wise Voting Network (PVNet [2]) to compare their performance against the PVNet model trained using the conventional image Superimposing Method (PVNet-SM).

The main conclusion of this work is that GANs can achieve similar performance as PVNet-SM while eliminating the need for manual work in background selection and dataset preparation.

[1] S. Hinterstoisser, V. Lepetit, S. Ilic, et al., "Model-based training, detection, and pose estimation of texture-less 3D objects in heavily cluttered scenes," in Computer Vision ACCV 2012, K. M. Lee, Y. Matsushita, J. M. Rehg, and Z. Hu, Eds., vol. 7724, Springer Berlin Heidelberg, 2012, pp. 548–562.

[2] S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao, "PVNet: Pixel-wise voting network for 6-DOF pose estimation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
			
	Parola chiave
	
				Pose Estimation
GAN
Synthetic dataset
			
	Relatore
	
				PRETTO, ALBERTO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
DelBen_Roberto.pdf accesso aperto Dimensione 10.87 MB Formato Adobe PDF Visualizza/Apri	10.87 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/54924