Interactive Music Generation from Text and Audio Prompts Using a Fine-Tuned Transformer Model

This thesis presents the design and development of an interactive system for generating music using both text and audio inputs, leveraging a fine-tuned transformer-based model, Tasty-MusicGen-Small, developed at the University of Padova. The system supports multimodal input: users can provide either a textual description or an audio prompt to generate coherent musical output. The application is implemented in Python and delivered through a web-based interface built using Gradio, enabling real-time interaction and output delivery in WAV format. The backend pipeline integrates Hugging Face’s Transformers and supports GPU acceleration through CUDA, ensuring efficient inference and high-quality synthesis. Key contributions include the dual-modality support (text-to-audio and audio-to-audio), robust preprocessing and handling of audio data, and a lightweight API for experimentation and creative exploration. This work contributes to the growing field of generative AI in music, providing tools that bridge the gap between human expression and machine-generated sound.

Interactive Music Generation from Text and Audio Prompts Using a Fine-Tuned Transformer Model

VIZZAPU, PRAKASH

2025/2026

Abstract

This thesis presents the design and development of an interactive system for generating music using both text and audio inputs, leveraging a fine-tuned transformer-based model, Tasty-MusicGen-Small, developed at the University of Padova. The system supports multimodal input: users can provide either a textual description or an audio prompt to generate coherent musical output. The application is implemented in Python and delivered through a web-based interface built using Gradio, enabling real-time interaction and output delivery in WAV format. The backend pipeline integrates Hugging Face’s Transformers and supports GPU acceleration through CUDA, ensuring efficient inference and high-quality synthesis. Key contributions include the dual-modality support (text-to-audio and audio-to-audio), robust preprocessing and handling of audio data, and a lightweight API for experimentation and creative exploration. This work contributes to the growing field of generative AI in music, providing tools that bridge the gap between human expression and machine-generated sound.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2025
			
	Titolo inglese
	
				Interactive Music Generation from Text and Audio Prompts Using a Fine-Tuned Transformer Model
			
	Abstract in italiano
	
				This thesis presents the design and development of an interactive system for generating music using both text and audio inputs, leveraging a fine-tuned transformer-based model, Tasty-MusicGen-Small, developed at the University of Padova. The system supports multimodal input: users can provide either a textual description or an audio prompt to generate coherent musical output. The application is implemented in Python and delivered through a web-based interface built using Gradio, enabling real-time interaction and output delivery in WAV format. The backend pipeline integrates Hugging Face’s Transformers and supports GPU acceleration through CUDA, ensuring efficient inference and high-quality synthesis. Key contributions include the dual-modality support (text-to-audio and audio-to-audio), robust preprocessing and handling of audio data, and a lightweight API for experimentation and creative exploration. This work contributes to the growing field of generative AI in music, providing tools that bridge the gap between human expression and machine-generated sound.
			
	Parola chiave
	
				Multimodal Music Gen
Transformer-Based Au
Text-to-Audio Synthe
			
	Relatore
	
				RODA', ANTONIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Thesis.pdf Accesso riservato Dimensione 1.47 MB Formato Adobe PDF	1.47 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/106864