Historical computer music compositions, particularly those using specialized programming languages, are at risk of preservation due to dispersing communities and evolving software environments. This thesis explores the potential of large language models as automated tools for preserving musical code interpretation, focusing on Csound, a key but vulnerable domain-specific language. We developed a curated corpus of Csound projects over two decades, encompassing various synthesis techniques and ensuring executability through thorough preprocessing and validation. Our approach includes full-context analysis, domain-aware prompting for synthesis semantics, schema-first outputs for human-readable and machine-parseable metadata, and audio grounding against spectrograms. Evaluation through algorithmic verification and multi-judge LLM analysis indicates that full-context methods significantly surpass segmented ones in understanding architectural relationships. Findings highlight the effectiveness of diverse evaluation philosophies for robust quality assessment and clarify the challenges in complex signal routing and custom abstractions. This study also sets methodological frameworks for digital musicology and preservation of endangered programming languages, releasing all materials as open-source to promote scalable LLM-assisted preservation methods in light of diminishing human expertise.

Historical computer music compositions, particularly those using specialized programming languages, are at risk of preservation due to dispersing communities and evolving software environments. This thesis explores the potential of large language models as automated tools for preserving musical code interpretation, focusing on Csound, a key but vulnerable domain-specific language. We developed a curated corpus of Csound projects over two decades, encompassing various synthesis techniques and ensuring executability through thorough preprocessing and validation. Our approach includes full-context analysis, domain-aware prompting for synthesis semantics, schema-first outputs for human-readable and machine-parseable metadata, and audio grounding against spectrograms. Evaluation through algorithmic verification and multi-judge LLM analysis indicates that full-context methods significantly surpass segmented ones in understanding architectural relationships. Findings highlight the effectiveness of diverse evaluation philosophies for robust quality assessment and clarify the challenges in complex signal routing and custom abstractions. This study also sets methodological frameworks for digital musicology and preservation of endangered programming languages, releasing all materials as open-source to promote scalable LLM-assisted preservation methods in light of diminishing human expertise.

LLMs for Understanding and Preserving Historical Musical Codes: Code Collection, Curation & Dataset

MASHHADI, ARASH
2024/2025

Abstract

Historical computer music compositions, particularly those using specialized programming languages, are at risk of preservation due to dispersing communities and evolving software environments. This thesis explores the potential of large language models as automated tools for preserving musical code interpretation, focusing on Csound, a key but vulnerable domain-specific language. We developed a curated corpus of Csound projects over two decades, encompassing various synthesis techniques and ensuring executability through thorough preprocessing and validation. Our approach includes full-context analysis, domain-aware prompting for synthesis semantics, schema-first outputs for human-readable and machine-parseable metadata, and audio grounding against spectrograms. Evaluation through algorithmic verification and multi-judge LLM analysis indicates that full-context methods significantly surpass segmented ones in understanding architectural relationships. Findings highlight the effectiveness of diverse evaluation philosophies for robust quality assessment and clarify the challenges in complex signal routing and custom abstractions. This study also sets methodological frameworks for digital musicology and preservation of endangered programming languages, releasing all materials as open-source to promote scalable LLM-assisted preservation methods in light of diminishing human expertise.
2024
LLMs for Understanding and Preserving Historical Musical Codes: Code Collection, Curation & Dataset
Historical computer music compositions, particularly those using specialized programming languages, are at risk of preservation due to dispersing communities and evolving software environments. This thesis explores the potential of large language models as automated tools for preserving musical code interpretation, focusing on Csound, a key but vulnerable domain-specific language. We developed a curated corpus of Csound projects over two decades, encompassing various synthesis techniques and ensuring executability through thorough preprocessing and validation. Our approach includes full-context analysis, domain-aware prompting for synthesis semantics, schema-first outputs for human-readable and machine-parseable metadata, and audio grounding against spectrograms. Evaluation through algorithmic verification and multi-judge LLM analysis indicates that full-context methods significantly surpass segmented ones in understanding architectural relationships. Findings highlight the effectiveness of diverse evaluation philosophies for robust quality assessment and clarify the challenges in complex signal routing and custom abstractions. This study also sets methodological frameworks for digital musicology and preservation of endangered programming languages, releasing all materials as open-source to promote scalable LLM-assisted preservation methods in light of diminishing human expertise.
Large Language Model
Prompt Engineering
Model Evaluation
LLM code interperter
Music Legacy Code
File in questo prodotto:
File Dimensione Formato  
Mashhadi_Arash.pdf

accesso aperto

Dimensione 3.91 MB
Formato Adobe PDF
3.91 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/99277