"Riffusion Meets Emotions: Deep Learning with Stable Diffusion for Emotionally
Expressive Music Composition"

In this work, I present the fine-tuning of the Riffusion model using Dreambooth, guided by the DEAM (Database for Emotional Analysis of Music) dataset, to enhance emotion-based music generation. Using the advanced software frameworks and computational resources accessible via Google Colab, three distinct experiments were executed with several key hyperparameters such as spectrogram resolution, batch size, learning rate schedules and regularization methods being varied. The goal was to condition the model to synthesize spectrograms accurately corresponding to localized desired emotions, but retaining an overall high musical quality. This was validated by the experimental results which showed incremental gain in the stability of loss function and clarity of spectrograms after each configuration. In particular, I noted more stable convergence and less overfitting in the final experiment thanks to using a cosine learning rate scheduler and introducing weight decay. Nevertheless, several issues such as prominent noise artifacts, unstable loss curves and relatively small and unbalanced dataset prevented us from achieving purely high quality outputs. These results demonstrate the promise and challenges of using diffusion models for emotion-driven music composition. I conclude the study by highlighting several components that should be investigated in future work, such as larger data sets, increased computational resources, denoising capabilities and model architecture design for diffusion models to explore the true potential of creating emotionally convincing music.

"Riffusion Meets Emotions: Deep Learning with Stable Diffusion for Emotionally Expressive Music Composition"

ZARE, MOHAMMAD MEHDI

2023/2024

Abstract

In this work, I present the fine-tuning of the Riffusion model using Dreambooth, guided by the DEAM (Database for Emotional Analysis of Music) dataset, to enhance emotion-based music generation. Using the advanced software frameworks and computational resources accessible via Google Colab, three distinct experiments were executed with several key hyperparameters such as spectrogram resolution, batch size, learning rate schedules and regularization methods being varied. The goal was to condition the model to synthesize spectrograms accurately corresponding to localized desired emotions, but retaining an overall high musical quality. This was validated by the experimental results which showed incremental gain in the stability of loss function and clarity of spectrograms after each configuration. In particular, I noted more stable convergence and less overfitting in the final experiment thanks to using a cosine learning rate scheduler and introducing weight decay. Nevertheless, several issues such as prominent noise artifacts, unstable loss curves and relatively small and unbalanced dataset prevented us from achieving purely high quality outputs. These results demonstrate the promise and challenges of using diffusion models for emotion-driven music composition. I conclude the study by highlighting several components that should be investigated in future work, such as larger data sets, increased computational resources, denoising capabilities and model architecture design for diffusion models to explore the true potential of creating emotionally convincing music.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				ICT FOR INTERNET AND MULTIMEDIA - INGEGNERIA PER LE COMUNICAZIONI MULTIMEDIALI E INTERNET Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				"Riffusion Meets Emotions: Deep Learning with Stable Diffusion for Emotionally
Expressive Music Composition"
			
	Parola chiave
	
				Deep Learning
Stable Diffusion
Riffusion
Transfer Learning
Music Generation
			
	Relatore
	
				RODA', ANTONIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Zare_Mohammad Mehdi.pdf accesso aperto Dimensione 2.9 MB Formato Adobe PDF Visualizza/Apri	2.9 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/76999