Leveraging generative models for the optimization of 3D implicit representations

In recent years, Neural Radiance Fields (NeRF) have been used to reconstruct a 3D scene starting from RGB images. They represent a density function and a color function using neural networks which are trained by minimizing a reconstruction loss with respect to the input views. A limitation of NeRFs is that they require many input views to achieve a good reconstruction. To overcome this problem, many research works use generative models to optimize the NeRF parameters. These approaches often involve the process of Score Distillation Sampling, which leverages the prior of large diffusion models to achieve a believable 3D representation. Score Distillation Sampling can be used alongside the minimization of the reconstruction loss, obtaining a 3D reconstruction starting from a smaller number of input views. In this thesis, we study the recent work on the field and develop a new approach by building upon MVDream, an existing text-to-3D model. MVDream leverages multi-view diffusion models, which generate more than one view at the same time. At every iteration, the method renders four orthogonal views and uses them to compute the Score Distillation Sampling gradient. The denoising process is conditioned on a textual prompt describing the object. Our pipeline takes a set of input views with pose information and learns a NeRF representation by jointly minimizing a reconstruction loss and the Score Distillation Sampling loss, evaluated using the MVDream denoising network. We show that the multi-view diffusion model allows to effectively reconstruct areas of the object that were not seen in the input views.

Leveraging generative models for the optimization of 3D implicit representations

TOSO, SIMONE

2023/2024

Abstract

In recent years, Neural Radiance Fields (NeRF) have been used to reconstruct a 3D scene starting from RGB images. They represent a density function and a color function using neural networks which are trained by minimizing a reconstruction loss with respect to the input views. A limitation of NeRFs is that they require many input views to achieve a good reconstruction. To overcome this problem, many research works use generative models to optimize the NeRF parameters. These approaches often involve the process of Score Distillation Sampling, which leverages the prior of large diffusion models to achieve a believable 3D representation. Score Distillation Sampling can be used alongside the minimization of the reconstruction loss, obtaining a 3D reconstruction starting from a smaller number of input views. In this thesis, we study the recent work on the field and develop a new approach by building upon MVDream, an existing text-to-3D model. MVDream leverages multi-view diffusion models, which generate more than one view at the same time. At every iteration, the method renders four orthogonal views and uses them to compute the Score Distillation Sampling gradient. The denoising process is conditioned on a textual prompt describing the object. Our pipeline takes a set of input views with pose information and learns a NeRF representation by jointly minimizing a reconstruction loss and the Score Distillation Sampling loss, evaluated using the MVDream denoising network. We show that the multi-view diffusion model allows to effectively reconstruct areas of the object that were not seen in the input views.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Fisica e Astronomia "Galileo Galilei" - DFA
			
	Corso di studio
	
				PHYSICS OF DATA Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Leveraging generative models for the optimization of 3D implicit representations
			
	Parola chiave
	
				3D reconstruction
NeRF
Deep Learning
			
	Relatore
	
				ZANUTTIGH, PIETRO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Toso_Simone.pdf accesso riservato Dimensione 6.03 MB Formato Adobe PDF	6.03 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/74201