The recent advancements in Generative Adversarial Networks (GANs) have accomplished incredible success in various fields, including image synthesis, video generation, and natural language processing. This thesis explores the application of GANs in the realm of audio processing, particularly focusing on the reconstruction and enhancement of corrupted audio signals. The main objective of this research is to use GANs to learn the intricate patterns of clean and corrupted audio data, therefore generating reliable harmonic audio reconstructions from corrupted inputs. To achieve this, an old film’s audio was used as a dataset that had corrupted parts in it to train and test the result. But to train the GAN in a better way a dataset was used that contains both real and corrupted versions of the same sound samples. The training process has been made with this dataset, and the old film’s sound file is used to test the GANs results. Preprocess of the audio samples is made both by extracting Mel-Frequency Cepstral Coefficients (MFCCs) and by extracting Short-time Fourier transform (STFT) for different applications to see which works. Both of these representations serve as the input for different GAN applications to find the most suitable one. The test of the reconstructed audio signals we have achieved with different GAN architectures is done by comparing the spectrograms of both original and reconstructed audio signals, and the second way is listening and comparing both original and reconstructed audio signals. With this thesis we aim to make our GAN-based model effectively learn and reconstruct high-quality audio signals from corrupted inputs, making it a promising tool for various applications in audio enhancement and restoration. This research contributes to the growing field of audio processing with GANs, providing insights and methodologies for future explorations in enhancing audio quality using deep learning techniques.

The recent advancements in Generative Adversarial Networks (GANs) have accomplished incredible success in various fields, including image synthesis, video generation, and natural language processing. This thesis explores the application of GANs in the realm of audio processing, particularly focusing on the reconstruction and enhancement of corrupted audio signals. The main objective of this research is to use GANs to learn the intricate patterns of clean and corrupted audio data, therefore generating reliable harmonic audio reconstructions from corrupted inputs. To achieve this, an old film’s audio was used as a dataset that had corrupted parts in it to train and test the result. But to train the GAN in a better way a dataset was used that contains both real and corrupted versions of the same sound samples. The training process has been made with this dataset, and the old film’s sound file is used to test the GANs results. Preprocess of the audio samples is made both by extracting Mel-Frequency Cepstral Coefficients (MFCCs) and by extracting Short-time Fourier transform (STFT) for different applications to see which works. Both of these representations serve as the input for different GAN applications to find the most suitable one. The test of the reconstructed audio signals we have achieved with different GAN architectures is done by comparing the spectrograms of both original and reconstructed audio signals, and the second way is listening and comparing both original and reconstructed audio signals. With this thesis we aim to make our GAN-based model effectively learn and reconstruct high-quality audio signals from corrupted inputs, making it a promising tool for various applications in audio enhancement and restoration. This research contributes to the growing field of audio processing with GANs, providing insights and methodologies for future explorations in enhancing audio quality using deep learning techniques.

Restoration Of The Damaged Sound Recordings Using Generative Adversarial Networks (GANs)

AKSOY, ADNAN KEREM
2023/2024

Abstract

The recent advancements in Generative Adversarial Networks (GANs) have accomplished incredible success in various fields, including image synthesis, video generation, and natural language processing. This thesis explores the application of GANs in the realm of audio processing, particularly focusing on the reconstruction and enhancement of corrupted audio signals. The main objective of this research is to use GANs to learn the intricate patterns of clean and corrupted audio data, therefore generating reliable harmonic audio reconstructions from corrupted inputs. To achieve this, an old film’s audio was used as a dataset that had corrupted parts in it to train and test the result. But to train the GAN in a better way a dataset was used that contains both real and corrupted versions of the same sound samples. The training process has been made with this dataset, and the old film’s sound file is used to test the GANs results. Preprocess of the audio samples is made both by extracting Mel-Frequency Cepstral Coefficients (MFCCs) and by extracting Short-time Fourier transform (STFT) for different applications to see which works. Both of these representations serve as the input for different GAN applications to find the most suitable one. The test of the reconstructed audio signals we have achieved with different GAN architectures is done by comparing the spectrograms of both original and reconstructed audio signals, and the second way is listening and comparing both original and reconstructed audio signals. With this thesis we aim to make our GAN-based model effectively learn and reconstruct high-quality audio signals from corrupted inputs, making it a promising tool for various applications in audio enhancement and restoration. This research contributes to the growing field of audio processing with GANs, providing insights and methodologies for future explorations in enhancing audio quality using deep learning techniques.
2023
Restoration Of The Damaged Sound Recordings Using Generative Adversarial Networks (GANs)
The recent advancements in Generative Adversarial Networks (GANs) have accomplished incredible success in various fields, including image synthesis, video generation, and natural language processing. This thesis explores the application of GANs in the realm of audio processing, particularly focusing on the reconstruction and enhancement of corrupted audio signals. The main objective of this research is to use GANs to learn the intricate patterns of clean and corrupted audio data, therefore generating reliable harmonic audio reconstructions from corrupted inputs. To achieve this, an old film’s audio was used as a dataset that had corrupted parts in it to train and test the result. But to train the GAN in a better way a dataset was used that contains both real and corrupted versions of the same sound samples. The training process has been made with this dataset, and the old film’s sound file is used to test the GANs results. Preprocess of the audio samples is made both by extracting Mel-Frequency Cepstral Coefficients (MFCCs) and by extracting Short-time Fourier transform (STFT) for different applications to see which works. Both of these representations serve as the input for different GAN applications to find the most suitable one. The test of the reconstructed audio signals we have achieved with different GAN architectures is done by comparing the spectrograms of both original and reconstructed audio signals, and the second way is listening and comparing both original and reconstructed audio signals. With this thesis we aim to make our GAN-based model effectively learn and reconstruct high-quality audio signals from corrupted inputs, making it a promising tool for various applications in audio enhancement and restoration. This research contributes to the growing field of audio processing with GANs, providing insights and methodologies for future explorations in enhancing audio quality using deep learning techniques.
Neural Network
Adversarial
Machine Learning
AI
Sound File
File in questo prodotto:
File Dimensione Formato  
AKSOY_ADNAN_KEREM.pdf

accesso aperto

Dimensione 8.77 MB
Formato Adobe PDF
8.77 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/75150