Restoration Of The Damaged Sound Recordings Using Generative Adversarial Networks (GANs)

The recent advancements in Generative Adversarial Networks (GANs) have accomplished incredible success in various fields, including image synthesis, video generation, and natural language processing. This thesis explores the application of GANs in the realm of audio processing, particularly focusing on the reconstruction and enhancement of corrupted audio signals. The main objective of this research is to use GANs to learn the intricate patterns of clean and corrupted audio data, therefore generating reliable harmonic audio reconstructions from corrupted inputs. To achieve this, an old film’s audio was used as a dataset that had corrupted parts in it to train and test the result. But to train the GAN in a better way a dataset was used that contains both real and corrupted versions of the same sound samples. The training process has been made with this dataset, and the old film’s sound file is used to test the GANs results. Preprocess of the audio samples is made both by extracting Mel-Frequency Cepstral Coefficients (MFCCs) and by extracting Short-time Fourier transform (STFT) for different applications to see which works. Both of these representations serve as the input for different GAN applications to find the most suitable one. The test of the reconstructed audio signals we have achieved with different GAN architectures is done by comparing the spectrograms of both original and reconstructed audio signals, and the second way is listening and comparing both original and reconstructed audio signals. With this thesis we aim to make our GAN-based model effectively learn and reconstruct high-quality audio signals from corrupted inputs, making it a promising tool for various applications in audio enhancement and restoration. This research contributes to the growing field of audio processing with GANs, providing insights and methodologies for future explorations in enhancing audio quality using deep learning techniques.

Restoration Of The Damaged Sound Recordings Using Generative Adversarial Networks (GANs)

AKSOY, ADNAN KEREM

2023/2024

Abstract

The recent advancements in Generative Adversarial Networks (GANs) have accomplished incredible success in various fields, including image synthesis, video generation, and natural language processing. This thesis explores the application of GANs in the realm of audio processing, particularly focusing on the reconstruction and enhancement of corrupted audio signals. The main objective of this research is to use GANs to learn the intricate patterns of clean and corrupted audio data, therefore generating reliable harmonic audio reconstructions from corrupted inputs. To achieve this, an old film’s audio was used as a dataset that had corrupted parts in it to train and test the result. But to train the GAN in a better way a dataset was used that contains both real and corrupted versions of the same sound samples. The training process has been made with this dataset, and the old film’s sound file is used to test the GANs results. Preprocess of the audio samples is made both by extracting Mel-Frequency Cepstral Coefficients (MFCCs) and by extracting Short-time Fourier transform (STFT) for different applications to see which works. Both of these representations serve as the input for different GAN applications to find the most suitable one. The test of the reconstructed audio signals we have achieved with different GAN architectures is done by comparing the spectrograms of both original and reconstructed audio signals, and the second way is listening and comparing both original and reconstructed audio signals. With this thesis we aim to make our GAN-based model effectively learn and reconstruct high-quality audio signals from corrupted inputs, making it a promising tool for various applications in audio enhancement and restoration. This research contributes to the growing field of audio processing with GANs, providing insights and methodologies for future explorations in enhancing audio quality using deep learning techniques.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				ICT FOR INTERNET AND MULTIMEDIA - INGEGNERIA PER LE COMUNICAZIONI MULTIMEDIALI E INTERNET Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Restoration Of The Damaged Sound Recordings Using Generative Adversarial Networks (GANs)
			
	Abstract in italiano
	
				The recent advancements in Generative Adversarial Networks (GANs) have accomplished incredible success in various fields, including image synthesis, video generation, and natural language processing. This thesis explores the application of GANs in the realm of audio processing, particularly focusing on the reconstruction and enhancement of corrupted audio signals. The main objective of this research is to use GANs to learn the intricate patterns of clean and corrupted audio data, therefore generating reliable harmonic audio reconstructions from corrupted inputs.

To achieve this, an old film’s audio was used as a dataset that had corrupted parts in it to train and test the result. But to train the GAN in a better way a dataset was used that contains both real and corrupted versions of the same sound samples. The training process has been made with this dataset, and the old film’s sound file is used to test the GANs results.

Preprocess of the audio samples is made both by extracting Mel-Frequency Cepstral Coefficients (MFCCs) and by extracting Short-time Fourier transform (STFT) for different applications to see which works.  Both of these representations serve as the input for different GAN applications to find the most suitable one.

The test of the reconstructed audio signals we have achieved with different GAN architectures is done by comparing the spectrograms of both original and reconstructed audio signals, and the second way is listening and comparing both original and reconstructed audio signals.

With this thesis we aim to make our GAN-based model effectively learn and reconstruct high-quality audio signals from corrupted inputs, making it a promising tool for various applications in audio enhancement and restoration. This research contributes to the growing field of audio processing with GANs, providing insights and methodologies for future explorations in enhancing audio quality using deep learning techniques.
			
	Parola chiave
	
				Neural Network
Adversarial
Machine Learning
AI
Sound File
			
	Relatore
	
				CANAZZA TARGON, SERGIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
AKSOY_ADNAN_KEREM.pdf accesso aperto Dimensione 8.77 MB Formato Adobe PDF Visualizza/Apri	8.77 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/75150