The field of generative artificial intelligence (AI) has shown immense potential in various creative domains, including music composition and audio synthesis. This thesis explores the application of generative AI techniques, specifically focusing on the WaveRNN model, for music and audio synthesis. WaveRNN is a waveform-based recurrent neural network that excels in capturing the temporal dependencies and complex structures present in audio data. This research aims to investigate how WaveRNN can be leveraged to generate high-quality and coherent music and audio samples. The approach involves training the WaveRNN model on a large dataset of audio recordings, enabling it to learn and generate new audio samples that resemble the training data. Additionally, the thesis explores the incorporation of time and frequency constraints in the music generation process to enforce specific musical guidelines and artistic preferences. To evaluate the effectiveness of the proposed approach, a series of experiments are conducted using different datasets and constraint settings. The generated music and audio samples are assessed based on their quality, fidelity to the training data, and adherence to the specified constraints. Furthermore, a comparative analysis is performed to evaluate the performance of WaveRNN with other existing methods in music and audio synthesis. The results of the experiments demonstrate the potential of WaveRNN in generating realistic and diverse music and audio samples. The incorporation of time and frequency constraints provides an avenue for creating music that adheres to specific guidelines, enabling the model to exhibit creative and controlled output. The findings of this research contribute to the advancement of generative AI in the realm of music and audio synthesis, opening up new possibilities for creative expression and innovative composition techniques.

The field of generative artificial intelligence (AI) has shown immense potential in various creative domains, including music composition and audio synthesis. This thesis explores the application of generative AI techniques, specifically focusing on the WaveRNN model, for music and audio synthesis. WaveRNN is a waveform-based recurrent neural network that excels in capturing the temporal dependencies and complex structures present in audio data. This research aims to investigate how WaveRNN can be leveraged to generate high-quality and coherent music and audio samples. The approach involves training the WaveRNN model on a large dataset of audio recordings, enabling it to learn and generate new audio samples that resemble the training data. Additionally, the thesis explores the incorporation of time and frequency constraints in the music generation process to enforce specific musical guidelines and artistic preferences. To evaluate the effectiveness of the proposed approach, a series of experiments are conducted using different datasets and constraint settings. The generated music and audio samples are assessed based on their quality, fidelity to the training data, and adherence to the specified constraints. Furthermore, a comparative analysis is performed to evaluate the performance of WaveRNN with other existing methods in music and audio synthesis. The results of the experiments demonstrate the potential of WaveRNN in generating realistic and diverse music and audio samples. The incorporation of time and frequency constraints provides an avenue for creating music that adheres to specific guidelines, enabling the model to exhibit creative and controlled output. The findings of this research contribute to the advancement of generative AI in the realm of music and audio synthesis, opening up new possibilities for creative expression and innovative composition techniques.

Computational generation of music: an approach based on Deep Learning

PRAVISHI, GAURI
2022/2023

Abstract

The field of generative artificial intelligence (AI) has shown immense potential in various creative domains, including music composition and audio synthesis. This thesis explores the application of generative AI techniques, specifically focusing on the WaveRNN model, for music and audio synthesis. WaveRNN is a waveform-based recurrent neural network that excels in capturing the temporal dependencies and complex structures present in audio data. This research aims to investigate how WaveRNN can be leveraged to generate high-quality and coherent music and audio samples. The approach involves training the WaveRNN model on a large dataset of audio recordings, enabling it to learn and generate new audio samples that resemble the training data. Additionally, the thesis explores the incorporation of time and frequency constraints in the music generation process to enforce specific musical guidelines and artistic preferences. To evaluate the effectiveness of the proposed approach, a series of experiments are conducted using different datasets and constraint settings. The generated music and audio samples are assessed based on their quality, fidelity to the training data, and adherence to the specified constraints. Furthermore, a comparative analysis is performed to evaluate the performance of WaveRNN with other existing methods in music and audio synthesis. The results of the experiments demonstrate the potential of WaveRNN in generating realistic and diverse music and audio samples. The incorporation of time and frequency constraints provides an avenue for creating music that adheres to specific guidelines, enabling the model to exhibit creative and controlled output. The findings of this research contribute to the advancement of generative AI in the realm of music and audio synthesis, opening up new possibilities for creative expression and innovative composition techniques.
2022
Computational generation of music: an approach based on Deep Learning
The field of generative artificial intelligence (AI) has shown immense potential in various creative domains, including music composition and audio synthesis. This thesis explores the application of generative AI techniques, specifically focusing on the WaveRNN model, for music and audio synthesis. WaveRNN is a waveform-based recurrent neural network that excels in capturing the temporal dependencies and complex structures present in audio data. This research aims to investigate how WaveRNN can be leveraged to generate high-quality and coherent music and audio samples. The approach involves training the WaveRNN model on a large dataset of audio recordings, enabling it to learn and generate new audio samples that resemble the training data. Additionally, the thesis explores the incorporation of time and frequency constraints in the music generation process to enforce specific musical guidelines and artistic preferences. To evaluate the effectiveness of the proposed approach, a series of experiments are conducted using different datasets and constraint settings. The generated music and audio samples are assessed based on their quality, fidelity to the training data, and adherence to the specified constraints. Furthermore, a comparative analysis is performed to evaluate the performance of WaveRNN with other existing methods in music and audio synthesis. The results of the experiments demonstrate the potential of WaveRNN in generating realistic and diverse music and audio samples. The incorporation of time and frequency constraints provides an avenue for creating music that adheres to specific guidelines, enabling the model to exhibit creative and controlled output. The findings of this research contribute to the advancement of generative AI in the realm of music and audio synthesis, opening up new possibilities for creative expression and innovative composition techniques.
Music Generation
WaveRNN
Neural Networks
Creative AI
Audio Synthesis
File in questo prodotto:
File Dimensione Formato  
Gauri_Pravishi.pdf

accesso riservato

Dimensione 916.17 kB
Formato Adobe PDF
916.17 kB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/58013