Sound Generation using GAN Models

In recent years, there has been a noticeable increase in the use of artificial intelligence for the generation of audio files. This master thesis presents a comprehensive approach to sound generation, incorporating the utilization of multiple GAN models and post-processing techniques to generate diverse samples of different duration. In particular, some of the GAN models used include SpecGAN, Catch-A-Waveform (CAW) and UNAGAN. The proposed pipeline consists of the following components: - SpecGAN, employed to generate one-second inharmonic samples. - UNAGAN, used to generate variable-length harmonic samples. - Post-processing, applied to minimize the amount of noise in the generated samples. - Post-processing and the utilization of CAW for concatenating and inpainting multiple samples generated with SpecGAN or UNAGAN. - CAW, which, starting from a single reference sample, is used to generate longer and cleaner samples. Experiments were conducted using both harmonic and inharmonic sound datasets. The results demonstrated that this pipeline allows for the generation of variable-length sounds, both harmonic and inharmonic, with quality comparable to that of real samples.

Sound Generation using GAN Models

BASTIANELLO, EDOARDO

2022/2023

Abstract

In recent years, there has been a noticeable increase in the use of artificial intelligence for the generation of audio files. This master thesis presents a comprehensive approach to sound generation, incorporating the utilization of multiple GAN models and post-processing techniques to generate diverse samples of different duration. In particular, some of the GAN models used include SpecGAN, Catch-A-Waveform (CAW) and UNAGAN. The proposed pipeline consists of the following components: - SpecGAN, employed to generate one-second inharmonic samples. - UNAGAN, used to generate variable-length harmonic samples. - Post-processing, applied to minimize the amount of noise in the generated samples. - Post-processing and the utilization of CAW for concatenating and inpainting multiple samples generated with SpecGAN or UNAGAN. - CAW, which, starting from a single reference sample, is used to generate longer and cleaner samples. Experiments were conducted using both harmonic and inharmonic sound datasets. The results demonstrated that this pipeline allows for the generation of variable-length sounds, both harmonic and inharmonic, with quality comparable to that of real samples.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2022
			
	Titolo inglese
	
				Sound Generation using GAN Models
			
	Abstract in italiano
	
				In recent years, there has been a noticeable increase in the use of artificial intelligence for the generation of audio files. This master thesis presents a comprehensive approach to sound generation, incorporating the utilization of multiple GAN models and post-processing techniques to generate diverse samples of different duration.
In particular, some of the GAN models used include SpecGAN, Catch-A-Waveform (CAW) and UNAGAN. The proposed pipeline consists of the following components:
- SpecGAN, employed to generate one-second inharmonic samples.
- UNAGAN, used to generate variable-length harmonic samples.
- Post-processing, applied to minimize the amount of noise in the generated samples.
- Post-processing and the utilization of CAW for concatenating and inpainting multiple samples generated with SpecGAN or UNAGAN.
- CAW, which, starting from a single reference sample, is used to generate longer and cleaner samples.
Experiments were conducted using both harmonic and inharmonic sound datasets.
The results demonstrated that this pipeline allows for the generation of variable-length sounds, both harmonic and inharmonic, with quality comparable to that of real samples.
			
	Parola chiave
	
				GAN
Audio
Sound Generation
AI
Deep Learning
			
	Relatore
	
				RODA', ANTONIO
			
	Correlatore
	
				CANAZZA TARGON, SERGIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Bastianello_Edoardo.pdf accesso aperto Dimensione 6.3 MB Formato Adobe PDF Visualizza/Apri	6.3 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/57997