This thesis explores the integration of geometric and semantic scene understanding in computer vision. The work builds upon recent advances in neural scene representation, particularly Neural Radiance Fields, which have revolutionized how we can reconstruct 3D scenes from images. These advances are combined with self-supervised learning techniques, specifically DINO descriptors, which allow for semantic understanding without requiring extensive labeled datasets. The proposed method extends the HashNeRF framework by incorporating semantic feature prediction capabilities. This approach maintains the efficient encoding and quick training characteristics of HashNeRF while adding the ability to learn and predict DINO features. The experimental results demonstrate several interesting findings about how such a unified system can be effectively trained. Contrary to initial expectations, attempting to introduce semantic learning gradually during training provided no significant benefits compared to learning both aspects simultaneously from the beginning. The research found that a careful balance of different loss components is crucial for optimal performance, with various loss functions showing different strengths in preserving either geometric or semantic fidelity. The results also showed that while increasing emphasis on geometric reconstruction can improve visual quality, this comes at the cost of semantic feature accuracy, highlighting an inherent trade-off in such unified representations. These findings contribute to our understanding of how neural scene representations can be enhanced with semantic information, potentially enabling more sophisticated scene understanding and manipulation capabilities in various computer vision applications.

Apprendimento di Dense Semantic Embeddings per Rappresentazioni Descrittive di Scene 3D

GOBBIN, ALBERTO
2023/2024

Abstract

This thesis explores the integration of geometric and semantic scene understanding in computer vision. The work builds upon recent advances in neural scene representation, particularly Neural Radiance Fields, which have revolutionized how we can reconstruct 3D scenes from images. These advances are combined with self-supervised learning techniques, specifically DINO descriptors, which allow for semantic understanding without requiring extensive labeled datasets. The proposed method extends the HashNeRF framework by incorporating semantic feature prediction capabilities. This approach maintains the efficient encoding and quick training characteristics of HashNeRF while adding the ability to learn and predict DINO features. The experimental results demonstrate several interesting findings about how such a unified system can be effectively trained. Contrary to initial expectations, attempting to introduce semantic learning gradually during training provided no significant benefits compared to learning both aspects simultaneously from the beginning. The research found that a careful balance of different loss components is crucial for optimal performance, with various loss functions showing different strengths in preserving either geometric or semantic fidelity. The results also showed that while increasing emphasis on geometric reconstruction can improve visual quality, this comes at the cost of semantic feature accuracy, highlighting an inherent trade-off in such unified representations. These findings contribute to our understanding of how neural scene representations can be enhanced with semantic information, potentially enabling more sophisticated scene understanding and manipulation capabilities in various computer vision applications.
2023
Learning Dense Semantic Embeddings for Descriptive 3D Scene Representations
Computer Vision
SSL
Semantic Embeddings
NeRF
File in questo prodotto:
File Dimensione Formato  
Gobbin_Alberto.pdf

embargo fino al 05/12/2027

Dimensione 2.42 MB
Formato Adobe PDF
2.42 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/78055