Variational Autoencoders and their use for Sound Generation

This thesis explores the use of Variational Autoencoders (VAEs) in the field of sound generation, with a particular focus on timbral diversity and the infinite possibilities of sound transformation. Sound generation is approached from two distinct angles: harmonic sounds and non-harmonic soundscapes. Several prior research studies have already demonstrated the ability of AutoEncoders to capture the primary features of a sound, creating a latent space that preserves these features and can subsequently generate similar sounds, characterized by a shared timbral quality or musical intent. This thesis will, therefore, scrutinize this sound generation system, conducting multiple experiments with mel-spectrograms as input. Furthermore, the latent space of the models will be extensively explored, capable of mapping the characteristics of sound into a space from which it is then possible to easily manipulate timbres and sound changes, leading to the generation of smooth sound morphing. A questionnaire was administered to some participants to assess crucial aspects of the generated sound, such as sound quality, sound classification, and the smoothness of the generated sound morphings. The results were very promising, indicating a good level of sound generation and a certain fluidity in sound transformation, both for harmonic and non-harmonic sounds. This research has natural practical applications in the field of sound design and the creation of background music generation systems. With strong prospects for sound manipulation and exploration, the approach presented is a promising blend of deep learning and musical knowledge.

Variational Autoencoders and their use for Sound Generation

DE LUCA, CHIARA

2022/2023

Abstract

This thesis explores the use of Variational Autoencoders (VAEs) in the field of sound generation, with a particular focus on timbral diversity and the infinite possibilities of sound transformation. Sound generation is approached from two distinct angles: harmonic sounds and non-harmonic soundscapes. Several prior research studies have already demonstrated the ability of AutoEncoders to capture the primary features of a sound, creating a latent space that preserves these features and can subsequently generate similar sounds, characterized by a shared timbral quality or musical intent. This thesis will, therefore, scrutinize this sound generation system, conducting multiple experiments with mel-spectrograms as input. Furthermore, the latent space of the models will be extensively explored, capable of mapping the characteristics of sound into a space from which it is then possible to easily manipulate timbres and sound changes, leading to the generation of smooth sound morphing. A questionnaire was administered to some participants to assess crucial aspects of the generated sound, such as sound quality, sound classification, and the smoothness of the generated sound morphings. The results were very promising, indicating a good level of sound generation and a certain fluidity in sound transformation, both for harmonic and non-harmonic sounds. This research has natural practical applications in the field of sound design and the creation of background music generation systems. With strong prospects for sound manipulation and exploration, the approach presented is a promising blend of deep learning and musical knowledge.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				DATA SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2022
			
	Titolo inglese
	
				Variational Autoencoders and their use for Sound Generation
			
	Parola chiave
	
				Deep Learning
VAE
Music Generation
Audio Representation
			
	Relatore
	
				CANAZZA TARGON, SERGIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
DeLuca_Chiara.pdf accesso aperto Dimensione 23.74 MB Formato Adobe PDF Visualizza/Apri	23.74 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/61379