A Multi-view Pixel-wise Voting Network for 6DoF Pose Estimation

6DoF pose estimation is an important task in the Computer Vision field for what regards robotic and automotive applications. Many recent approaches successfully perform pose estimation on monocular images, which lack depth information. In this work, the potential of extending such methods to a multi-view setting is explored, in order to recover depth information from geometrical relations between the views. In particular two different multi-view adaptations for a particular monocular pose estimator, called PVNet, are developed, by either combining monocular results on the individual views or by modifying the original method to take in input directly the set of views. The new models are evaluated on the TOD transparent object dataset and compared against the original PVNet implementation, a depth-based pose estimation called DenseFusion, and the method proposed by the authors of the dataset, called Keypose. Experimental results show that integrating multi-view information significantly increases test accuracy and that both models outperform DenseFusion, while still being slightly surpassed by Keypose.

A Multi-view Pixel-wise Voting Network for 6DoF Pose Estimation

DONADI, IVANO

2021/2022

Abstract

6DoF pose estimation is an important task in the Computer Vision field for what regards robotic and automotive applications. Many recent approaches successfully perform pose estimation on monocular images, which lack depth information. In this work, the potential of extending such methods to a multi-view setting is explored, in order to recover depth information from geometrical relations between the views. In particular two different multi-view adaptations for a particular monocular pose estimator, called PVNet, are developed, by either combining monocular results on the individual views or by modifying the original method to take in input directly the set of views. The new models are evaluated on the TOD transparent object dataset and compared against the original PVNet implementation, a depth-based pose estimation called DenseFusion, and the method proposed by the authors of the dataset, called Keypose. Experimental results show that integrating multi-view information significantly increases test accuracy and that both models outperform DenseFusion, while still being slightly surpassed by Keypose.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2021
			
	Titolo inglese
	
				A Multi-view Pixel-wise Voting Network for 6DoF Pose Estimation
			
	Parola chiave
	
				6DoF Pose Estimation
3D Object Detection
Deep learning
			
	Relatore
	
				PRETTO, ALBERTO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
tesi_3d_pose_estimation_pdfA.pdf accesso aperto Dimensione 3.39 MB Formato Adobe PDF Visualizza/Apri	3.39 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/31496