Deep Learning-Based Noise Removal Techniques for Audio-Driven IoT Systems

This thesis investigates the critical challenge of background noise interference in environments, focusing on the development of robust detection and reduction frameworks to enhance audio signal accuracy. TIScode is a technology developed by OGENUS S.R.L. that enables contactless data transmission using audio as the communication medium. Short audio clips generated via generative AI (MusicGen), encoded with unique Code Number Reference (CNR), and broadcast through speakers to be captured and decoded by smartphone microphones. Because these signals are transmitted in real-world environments, they are susceptible to urban background noise and other interference, which can significantly degrade decoding accuracy. The primary objective is to evaluate the efficacy of distinguishing clean TIScode target signals from complex, non-stationary background noise using a spectrum of signal processing and advanced deep learning methodologies. To ensure a rigorous evaluation, a custom dataset was synthesized by integrating clean TIScode audio recordings with various real-world environmental noise samples from the UrbanSound8K dataset, mixed across a wide range of Signal-to-Noise Ratio (SNR) levels to simulate varying degrees of degradation. The research implements and evaluates a comprehensive suite of denoising benchmarks, including improved spectral subtraction, two-pass spectral subtraction, and Wiener filtering, along with a proposed hybrid spectral subtraction-Wiener approach. These classical techniques were compared against state-of-the-art neural architectures, specifically Demucs-based source separation, U-Net, and Attention U-Net. Performance was quantified using a multidimensional evaluation matrix, including SNR improvement, Scale-Invariant Signal-to-Distortion Ratio (SI-SDR). The experimental results reveal a distinct performance trade-off between classical signal processing and deep learning architectures. The source separation and deep learning models demonstrated superior isolation capabilities in extreme noise conditions, successfully reconstructing signals that were otherwise unintelligible. However, their performance saturates as the input SNR improves, with some models introducing artifacts and even degrading the signal quality in high-SNR scenarios. In contrast, Improved Spectral Subtraction provides the most consistent performance across all conditions, offering a stable and predictable enhancement profile without the risk of over-processing. These findings suggest that deep learning methods are best suited for low-SNR environments, while optimized classical approaches remain more reliable for preserving signal fidelity in cleaner conditions. This supports a scenario-dependent strategy for the real-world deployment of the TIScode system such as telecommunications, hearing aids, and speech recognition systems.

Deep Learning-Based Noise Removal Techniques for Audio-Driven IoT Systems

COBANBAS, SEZGI

2025/2026

Abstract

This thesis investigates the critical challenge of background noise interference in environments, focusing on the development of robust detection and reduction frameworks to enhance audio signal accuracy. TIScode is a technology developed by OGENUS S.R.L. that enables contactless data transmission using audio as the communication medium. Short audio clips generated via generative AI (MusicGen), encoded with unique Code Number Reference (CNR), and broadcast through speakers to be captured and decoded by smartphone microphones. Because these signals are transmitted in real-world environments, they are susceptible to urban background noise and other interference, which can significantly degrade decoding accuracy. The primary objective is to evaluate the efficacy of distinguishing clean TIScode target signals from complex, non-stationary background noise using a spectrum of signal processing and advanced deep learning methodologies. To ensure a rigorous evaluation, a custom dataset was synthesized by integrating clean TIScode audio recordings with various real-world environmental noise samples from the UrbanSound8K dataset, mixed across a wide range of Signal-to-Noise Ratio (SNR) levels to simulate varying degrees of degradation. The research implements and evaluates a comprehensive suite of denoising benchmarks, including improved spectral subtraction, two-pass spectral subtraction, and Wiener filtering, along with a proposed hybrid spectral subtraction-Wiener approach. These classical techniques were compared against state-of-the-art neural architectures, specifically Demucs-based source separation, U-Net, and Attention U-Net. Performance was quantified using a multidimensional evaluation matrix, including SNR improvement, Scale-Invariant Signal-to-Distortion Ratio (SI-SDR). The experimental results reveal a distinct performance trade-off between classical signal processing and deep learning architectures. The source separation and deep learning models demonstrated superior isolation capabilities in extreme noise conditions, successfully reconstructing signals that were otherwise unintelligible. However, their performance saturates as the input SNR improves, with some models introducing artifacts and even degrading the signal quality in high-SNR scenarios. In contrast, Improved Spectral Subtraction provides the most consistent performance across all conditions, offering a stable and predictable enhancement profile without the risk of over-processing. These findings suggest that deep learning methods are best suited for low-SNR environments, while optimized classical approaches remain more reliable for preserving signal fidelity in cleaner conditions. This supports a scenario-dependent strategy for the real-world deployment of the TIScode system such as telecommunications, hearing aids, and speech recognition systems.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				DATA SCIENCE  Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2025
			
	Titolo inglese
	
				Deep Learning-Based Noise Removal Techniques for Audio-Driven IoT Systems
			
	Parola chiave
	
				Audio Processing
Noise Reduction
Deep Learning
Internet of Things
			
	Relatore
	
				BADIA, LEONARDO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Sezgi_Cobanbas_Thesis.pdf accesso aperto Dimensione 8.37 MB Formato Adobe PDF Visualizza/Apri	8.37 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/108225