AI driven generation and classification of short sound messages for Internet of Audio Things

This thesis explores the automated generation and classification of short sound messages, referred to as TIScode, for the Internet of Audio Things (IoAuT). A TIScode is a brief audio sequence, lasting 4-5 seconds, carrying digital information that can be recognized by a specific smartphone application. The work focuses on developing methodologies for the various stages of the TIScode pipeline, including generation, transmission, and ultimately, reception and decoding. For the generation phase, MusicGen, a state-of-the-art autoregressive transformer model, is proposed, along with an analysis of its degrees of freedom to maximize the variety of distinct audio tracks that can be generated. Additionally, a channel coding system based on the quantization of sound features and certain high-level features extracted through convolutional neural networks (CNNs) is introduced. These features are mapped to create a unique bitmap for each TIScode, simplifying the decoding process. An algorithm is presented for the recognition phase, combining sound feature analysis with frequency-based peak analysis to enhance detection accuracy. Experimental results, obtained through simulations and field tests, demonstrate the effectiveness of the system in retrieving the digital information encoded within sound messages.

AI driven generation and classification of short sound messages for Internet of Audio Things

FAVERO, MANUELE

2023/2024

Abstract

This thesis explores the automated generation and classification of short sound messages, referred to as TIScode, for the Internet of Audio Things (IoAuT). A TIScode is a brief audio sequence, lasting 4-5 seconds, carrying digital information that can be recognized by a specific smartphone application. The work focuses on developing methodologies for the various stages of the TIScode pipeline, including generation, transmission, and ultimately, reception and decoding. For the generation phase, MusicGen, a state-of-the-art autoregressive transformer model, is proposed, along with an analysis of its degrees of freedom to maximize the variety of distinct audio tracks that can be generated. Additionally, a channel coding system based on the quantization of sound features and certain high-level features extracted through convolutional neural networks (CNNs) is introduced. These features are mapped to create a unique bitmap for each TIScode, simplifying the decoding process. An algorithm is presented for the recognition phase, combining sound feature analysis with frequency-based peak analysis to enhance detection accuracy. Experimental results, obtained through simulations and field tests, demonstrate the effectiveness of the system in retrieving the digital information encoded within sound messages.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				ICT FOR INTERNET AND MULTIMEDIA - INGEGNERIA PER LE COMUNICAZIONI MULTIMEDIALI E INTERNET Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				AI driven generation and classification of short sound messages for Internet of Audio Things
			
	Abstract in italiano
	
				This thesis explores the automated generation and classification of short sound messages, referred to as TIScode, for the Internet of Audio Things (IoAuT). A TIScode is a brief audio sequence, lasting 4-5 seconds, carrying digital information that can be recognized by a specific smartphone application.
The work focuses on developing methodologies for the various stages of the TIScode pipeline, including generation, transmission, and ultimately, reception and decoding. For the generation phase, MusicGen, a state-of-the-art autoregressive transformer model, is proposed, along with an analysis of its degrees of freedom to maximize the variety of distinct audio tracks that can be generated.
Additionally, a channel coding system based on the quantization of sound features and certain high-level features extracted through convolutional neural networks (CNNs) is introduced. These features are mapped to create a unique bitmap for each TIScode, simplifying the decoding process.
An algorithm is presented for the recognition phase, combining sound feature analysis with frequency-based peak analysis to enhance detection accuracy.
Experimental results, obtained through simulations and field tests, demonstrate the effectiveness of the system in retrieving the digital information encoded within sound messages.
			
	Parola chiave
	
				InternetOfAudioThing
Signal Processing
Generative Models
Classification Model
InformationSound
			
	Relatore
	
				BADIA, LEONARDO
			
	Correlatore
	
				CANAZZA TARGON, SERGIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Favero_Manuele.pdf accesso aperto Dimensione 7.01 MB Formato Adobe PDF Visualizza/Apri	7.01 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/73442