Apprendimento di Dense Semantic Embeddings per Rappresentazioni Descrittive di Scene 3D

This thesis explores the integration of geometric and semantic scene understanding in computer vision. The work builds upon recent advances in neural scene representation, particularly Neural Radiance Fields, which have revolutionized how we can reconstruct 3D scenes from images. These advances are combined with self-supervised learning techniques, specifically DINO descriptors, which allow for semantic understanding without requiring extensive labeled datasets. The proposed method extends the HashNeRF framework by incorporating semantic feature prediction capabilities. This approach maintains the efficient encoding and quick training characteristics of HashNeRF while adding the ability to learn and predict DINO features. The experimental results demonstrate several interesting findings about how such a unified system can be effectively trained. Contrary to initial expectations, attempting to introduce semantic learning gradually during training provided no significant benefits compared to learning both aspects simultaneously from the beginning. The research found that a careful balance of different loss components is crucial for optimal performance, with various loss functions showing different strengths in preserving either geometric or semantic fidelity. The results also showed that while increasing emphasis on geometric reconstruction can improve visual quality, this comes at the cost of semantic feature accuracy, highlighting an inherent trade-off in such unified representations. These findings contribute to our understanding of how neural scene representations can be enhanced with semantic information, potentially enabling more sophisticated scene understanding and manipulation capabilities in various computer vision applications.

Apprendimento di Dense Semantic Embeddings per Rappresentazioni Descrittive di Scene 3D

GOBBIN, ALBERTO

2023/2024

Abstract

This thesis explores the integration of geometric and semantic scene understanding in computer vision. The work builds upon recent advances in neural scene representation, particularly Neural Radiance Fields, which have revolutionized how we can reconstruct 3D scenes from images. These advances are combined with self-supervised learning techniques, specifically DINO descriptors, which allow for semantic understanding without requiring extensive labeled datasets. The proposed method extends the HashNeRF framework by incorporating semantic feature prediction capabilities. This approach maintains the efficient encoding and quick training characteristics of HashNeRF while adding the ability to learn and predict DINO features. The experimental results demonstrate several interesting findings about how such a unified system can be effectively trained. Contrary to initial expectations, attempting to introduce semantic learning gradually during training provided no significant benefits compared to learning both aspects simultaneously from the beginning. The research found that a careful balance of different loss components is crucial for optimal performance, with various loss functions showing different strengths in preserving either geometric or semantic fidelity. The results also showed that while increasing emphasis on geometric reconstruction can improve visual quality, this comes at the cost of semantic feature accuracy, highlighting an inherent trade-off in such unified representations. These findings contribute to our understanding of how neural scene representations can be enhanced with semantic information, potentially enabling more sophisticated scene understanding and manipulation capabilities in various computer vision applications.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				ICT FOR INTERNET AND MULTIMEDIA - INGEGNERIA PER LE COMUNICAZIONI MULTIMEDIALI E INTERNET Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Learning Dense Semantic Embeddings for Descriptive 3D Scene Representations
			
	Parola chiave
	
				Computer Vision
SSL
Semantic Embeddings
NeRF
			
	Relatore
	
				ZANUTTIGH, PIETRO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Gobbin_Alberto.pdf embargo fino al 05/12/2027 Dimensione 2.42 MB Formato Adobe PDF	2.42 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/78055