Historical computer music compositions, particularly those using specialized programming languages, are at risk of preservation due to dispersing communities and evolving software environments. This thesis explores the potential of large language models as automated tools for preserving musical code interpretation, focusing on Csound, a key but vulnerable domain-specific language. We developed a curated corpus of Csound projects over two decades, encompassing various synthesis techniques and ensuring executability through thorough preprocessing and validation. Our approach includes full-context analysis, domain-aware prompting for synthesis semantics, schema-first outputs for human-readable and machine-parseable metadata, and audio grounding against spectrograms. Evaluation through algorithmic verification and multi-judge LLM analysis indicates that full-context methods significantly surpass segmented ones in understanding architectural relationships. Findings highlight the effectiveness of diverse evaluation philosophies for robust quality assessment and clarify the challenges in complex signal routing and custom abstractions. This study also sets methodological frameworks for digital musicology and preservation of endangered programming languages, releasing all materials as open-source to promote scalable LLM-assisted preservation methods in light of diminishing human expertise.
Historical computer music compositions, particularly those using specialized programming languages, are at risk of preservation due to dispersing communities and evolving software environments. This thesis explores the potential of large language models as automated tools for preserving musical code interpretation, focusing on Csound, a key but vulnerable domain-specific language. We developed a curated corpus of Csound projects over two decades, encompassing various synthesis techniques and ensuring executability through thorough preprocessing and validation. Our approach includes full-context analysis, domain-aware prompting for synthesis semantics, schema-first outputs for human-readable and machine-parseable metadata, and audio grounding against spectrograms. Evaluation through algorithmic verification and multi-judge LLM analysis indicates that full-context methods significantly surpass segmented ones in understanding architectural relationships. Findings highlight the effectiveness of diverse evaluation philosophies for robust quality assessment and clarify the challenges in complex signal routing and custom abstractions. This study also sets methodological frameworks for digital musicology and preservation of endangered programming languages, releasing all materials as open-source to promote scalable LLM-assisted preservation methods in light of diminishing human expertise.
LLMs for Understanding and Preserving Historical Musical Codes: Code Collection, Curation & Dataset
MASHHADI, ARASH
2024/2025
Abstract
Historical computer music compositions, particularly those using specialized programming languages, are at risk of preservation due to dispersing communities and evolving software environments. This thesis explores the potential of large language models as automated tools for preserving musical code interpretation, focusing on Csound, a key but vulnerable domain-specific language. We developed a curated corpus of Csound projects over two decades, encompassing various synthesis techniques and ensuring executability through thorough preprocessing and validation. Our approach includes full-context analysis, domain-aware prompting for synthesis semantics, schema-first outputs for human-readable and machine-parseable metadata, and audio grounding against spectrograms. Evaluation through algorithmic verification and multi-judge LLM analysis indicates that full-context methods significantly surpass segmented ones in understanding architectural relationships. Findings highlight the effectiveness of diverse evaluation philosophies for robust quality assessment and clarify the challenges in complex signal routing and custom abstractions. This study also sets methodological frameworks for digital musicology and preservation of endangered programming languages, releasing all materials as open-source to promote scalable LLM-assisted preservation methods in light of diminishing human expertise.| File | Dimensione | Formato | |
|---|---|---|---|
|
Mashhadi_Arash.pdf
accesso aperto
Dimensione
3.91 MB
Formato
Adobe PDF
|
3.91 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/99277