This thesis explores the generation of soundscapes and audio sequences using the Riffusion model, which leverages stable diffusion techniques to produce spectrograms from abstract emotion signifiers such as valence and arousal scores. Utilizing the DEAM dataset, the project involves creating spectrograms from song files and training a new model capable of generating audio based on provided emotion vectors in the discrete form of labels clustered in the valence-arousal latent space. The integration of emotion-driven soundscapes has significant implications in the field of Internet and Communication Technologies for Internet and Multimedia. By enabling the generation of audio that reflects specific emotional states, this work can enhance multimedia applications, offering more immersive and personalized user experiences. Potential applications can significantly impact engagement and effectiveness. The research contributes to the growing body of work on emotion-aware multimedia, presenting a novel approach to audio generation that bridges the gap between computational models and human emotional experience. The findings underscore the transformative potential of combining advanced machine learning techniques with multimedia content creation, paving the way for innovative applications in various domains of internet and communication technologies.

This thesis explores the generation of soundscapes and audio sequences using the Riffusion model, which leverages stable diffusion techniques to produce spectrograms from abstract emotion signifiers such as valence and arousal scores. Utilizing the DEAM dataset, the project involves creating spectrograms from song files and training a new model capable of generating audio based on provided emotion vectors in the discrete form of labels clustered in the valence-arousal latent space. The integration of emotion-driven soundscapes has significant implications in the field of Internet and Communication Technologies for Internet and Multimedia. By enabling the generation of audio that reflects specific emotional states, this work can enhance multimedia applications, offering more immersive and personalized user experiences. Potential applications can significantly impact engagement and effectiveness. The research contributes to the growing body of work on emotion-aware multimedia, presenting a novel approach to audio generation that bridges the gap between computational models and human emotional experience. The findings underscore the transformative potential of combining advanced machine learning techniques with multimedia content creation, paving the way for innovative applications in various domains of internet and communication technologies.

Soundscape generation with stable diffusion using abstract emotion descriptors

SANISOGLU, MEHMET
2023/2024

Abstract

This thesis explores the generation of soundscapes and audio sequences using the Riffusion model, which leverages stable diffusion techniques to produce spectrograms from abstract emotion signifiers such as valence and arousal scores. Utilizing the DEAM dataset, the project involves creating spectrograms from song files and training a new model capable of generating audio based on provided emotion vectors in the discrete form of labels clustered in the valence-arousal latent space. The integration of emotion-driven soundscapes has significant implications in the field of Internet and Communication Technologies for Internet and Multimedia. By enabling the generation of audio that reflects specific emotional states, this work can enhance multimedia applications, offering more immersive and personalized user experiences. Potential applications can significantly impact engagement and effectiveness. The research contributes to the growing body of work on emotion-aware multimedia, presenting a novel approach to audio generation that bridges the gap between computational models and human emotional experience. The findings underscore the transformative potential of combining advanced machine learning techniques with multimedia content creation, paving the way for innovative applications in various domains of internet and communication technologies.
2023
Soundscape generation with stable diffusion using abstract emotion descriptors
This thesis explores the generation of soundscapes and audio sequences using the Riffusion model, which leverages stable diffusion techniques to produce spectrograms from abstract emotion signifiers such as valence and arousal scores. Utilizing the DEAM dataset, the project involves creating spectrograms from song files and training a new model capable of generating audio based on provided emotion vectors in the discrete form of labels clustered in the valence-arousal latent space. The integration of emotion-driven soundscapes has significant implications in the field of Internet and Communication Technologies for Internet and Multimedia. By enabling the generation of audio that reflects specific emotional states, this work can enhance multimedia applications, offering more immersive and personalized user experiences. Potential applications can significantly impact engagement and effectiveness. The research contributes to the growing body of work on emotion-aware multimedia, presenting a novel approach to audio generation that bridges the gap between computational models and human emotional experience. The findings underscore the transformative potential of combining advanced machine learning techniques with multimedia content creation, paving the way for innovative applications in various domains of internet and communication technologies.
soundscape
stable diffusion
spectrogram
emotion
audio
File in questo prodotto:
File Dimensione Formato  
Sanisoglu_Mehmet.pdf

accesso riservato

Dimensione 1.36 MB
Formato Adobe PDF
1.36 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/73144