In recent years, there has been a noticeable increase in the use of artificial intelligence for the generation of audio files. This master thesis presents a comprehensive approach to sound generation, incorporating the utilization of multiple GAN models and post-processing techniques to generate diverse samples of different duration. In particular, some of the GAN models used include SpecGAN, Catch-A-Waveform (CAW) and UNAGAN. The proposed pipeline consists of the following components: - SpecGAN, employed to generate one-second inharmonic samples. - UNAGAN, used to generate variable-length harmonic samples. - Post-processing, applied to minimize the amount of noise in the generated samples. - Post-processing and the utilization of CAW for concatenating and inpainting multiple samples generated with SpecGAN or UNAGAN. - CAW, which, starting from a single reference sample, is used to generate longer and cleaner samples. Experiments were conducted using both harmonic and inharmonic sound datasets. The results demonstrated that this pipeline allows for the generation of variable-length sounds, both harmonic and inharmonic, with quality comparable to that of real samples.
In recent years, there has been a noticeable increase in the use of artificial intelligence for the generation of audio files. This master thesis presents a comprehensive approach to sound generation, incorporating the utilization of multiple GAN models and post-processing techniques to generate diverse samples of different duration. In particular, some of the GAN models used include SpecGAN, Catch-A-Waveform (CAW) and UNAGAN. The proposed pipeline consists of the following components: - SpecGAN, employed to generate one-second inharmonic samples. - UNAGAN, used to generate variable-length harmonic samples. - Post-processing, applied to minimize the amount of noise in the generated samples. - Post-processing and the utilization of CAW for concatenating and inpainting multiple samples generated with SpecGAN or UNAGAN. - CAW, which, starting from a single reference sample, is used to generate longer and cleaner samples. Experiments were conducted using both harmonic and inharmonic sound datasets. The results demonstrated that this pipeline allows for the generation of variable-length sounds, both harmonic and inharmonic, with quality comparable to that of real samples.
Sound Generation using GAN Models
BASTIANELLO, EDOARDO
2022/2023
Abstract
In recent years, there has been a noticeable increase in the use of artificial intelligence for the generation of audio files. This master thesis presents a comprehensive approach to sound generation, incorporating the utilization of multiple GAN models and post-processing techniques to generate diverse samples of different duration. In particular, some of the GAN models used include SpecGAN, Catch-A-Waveform (CAW) and UNAGAN. The proposed pipeline consists of the following components: - SpecGAN, employed to generate one-second inharmonic samples. - UNAGAN, used to generate variable-length harmonic samples. - Post-processing, applied to minimize the amount of noise in the generated samples. - Post-processing and the utilization of CAW for concatenating and inpainting multiple samples generated with SpecGAN or UNAGAN. - CAW, which, starting from a single reference sample, is used to generate longer and cleaner samples. Experiments were conducted using both harmonic and inharmonic sound datasets. The results demonstrated that this pipeline allows for the generation of variable-length sounds, both harmonic and inharmonic, with quality comparable to that of real samples.File | Dimensione | Formato | |
---|---|---|---|
Bastianello_Edoardo.pdf
accesso aperto
Dimensione
6.3 MB
Formato
Adobe PDF
|
6.3 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/57997