In recent years, there has been a noticeable increase in the use of artificial intelligence for the generation of audio files. This master thesis presents a comprehensive approach to sound generation, incorporating the utilization of multiple GAN models and post-processing techniques to generate diverse samples of different duration. In particular, some of the GAN models used include SpecGAN, Catch-A-Waveform (CAW) and UNAGAN. The proposed pipeline consists of the following components: - SpecGAN, employed to generate one-second inharmonic samples. - UNAGAN, used to generate variable-length harmonic samples. - Post-processing, applied to minimize the amount of noise in the generated samples. - Post-processing and the utilization of CAW for concatenating and inpainting multiple samples generated with SpecGAN or UNAGAN. - CAW, which, starting from a single reference sample, is used to generate longer and cleaner samples. Experiments were conducted using both harmonic and inharmonic sound datasets. The results demonstrated that this pipeline allows for the generation of variable-length sounds, both harmonic and inharmonic, with quality comparable to that of real samples.

In recent years, there has been a noticeable increase in the use of artificial intelligence for the generation of audio files. This master thesis presents a comprehensive approach to sound generation, incorporating the utilization of multiple GAN models and post-processing techniques to generate diverse samples of different duration. In particular, some of the GAN models used include SpecGAN, Catch-A-Waveform (CAW) and UNAGAN. The proposed pipeline consists of the following components: - SpecGAN, employed to generate one-second inharmonic samples. - UNAGAN, used to generate variable-length harmonic samples. - Post-processing, applied to minimize the amount of noise in the generated samples. - Post-processing and the utilization of CAW for concatenating and inpainting multiple samples generated with SpecGAN or UNAGAN. - CAW, which, starting from a single reference sample, is used to generate longer and cleaner samples. Experiments were conducted using both harmonic and inharmonic sound datasets. The results demonstrated that this pipeline allows for the generation of variable-length sounds, both harmonic and inharmonic, with quality comparable to that of real samples.

Sound Generation using GAN Models

BASTIANELLO, EDOARDO
2022/2023

Abstract

In recent years, there has been a noticeable increase in the use of artificial intelligence for the generation of audio files. This master thesis presents a comprehensive approach to sound generation, incorporating the utilization of multiple GAN models and post-processing techniques to generate diverse samples of different duration. In particular, some of the GAN models used include SpecGAN, Catch-A-Waveform (CAW) and UNAGAN. The proposed pipeline consists of the following components: - SpecGAN, employed to generate one-second inharmonic samples. - UNAGAN, used to generate variable-length harmonic samples. - Post-processing, applied to minimize the amount of noise in the generated samples. - Post-processing and the utilization of CAW for concatenating and inpainting multiple samples generated with SpecGAN or UNAGAN. - CAW, which, starting from a single reference sample, is used to generate longer and cleaner samples. Experiments were conducted using both harmonic and inharmonic sound datasets. The results demonstrated that this pipeline allows for the generation of variable-length sounds, both harmonic and inharmonic, with quality comparable to that of real samples.
2022
Sound Generation using GAN Models
In recent years, there has been a noticeable increase in the use of artificial intelligence for the generation of audio files. This master thesis presents a comprehensive approach to sound generation, incorporating the utilization of multiple GAN models and post-processing techniques to generate diverse samples of different duration. In particular, some of the GAN models used include SpecGAN, Catch-A-Waveform (CAW) and UNAGAN. The proposed pipeline consists of the following components: - SpecGAN, employed to generate one-second inharmonic samples. - UNAGAN, used to generate variable-length harmonic samples. - Post-processing, applied to minimize the amount of noise in the generated samples. - Post-processing and the utilization of CAW for concatenating and inpainting multiple samples generated with SpecGAN or UNAGAN. - CAW, which, starting from a single reference sample, is used to generate longer and cleaner samples. Experiments were conducted using both harmonic and inharmonic sound datasets. The results demonstrated that this pipeline allows for the generation of variable-length sounds, both harmonic and inharmonic, with quality comparable to that of real samples.
GAN
Audio
Sound Generation
AI
Deep Learning
File in questo prodotto:
File Dimensione Formato  
Bastianello_Edoardo.pdf

accesso aperto

Dimensione 6.3 MB
Formato Adobe PDF
6.3 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/57997