Medical imaging data is embedded in DICOM metadata, which describe acquisition characteristics and scanner-specific parameters. While these metadata are fundamental for data downstream analysis in clinical field, especially in radiology, they are often incomplete and inconsistently populated, also due to the presence of different vendors conventions. This could introduce significant variability in features and hinder the interpretation of imaging semantics. This project investigated the feasibility of predicting and standardizing MRI and PET acquisition labels directly from heterogeneous textual DICOM header metadata exploiting Large Language Models (LLMs). It focused on the automatic classification of attributes that regard image formation, specifically sequence and contrast type, acquisition plane, modality, specific tracer, manufacturer, device model and the anatomical region of the body acquired. LLMs are well suited to this task because they can robustly parse various textual inputs, comprehending both partially structured and free-text DICOM attributes, and mapping them to a standardized and controlled label space. The study consisted of a series of experiments testing how well both general purpose GPT-based and open-source LLaMA 3 models could classify metadata. Firstly, an out-of-the-box (OOB) approach set a baseline performance. Subsequently, fine-tuning was performed for both GPT and LLaMA models to improve results by supervising the learning process with domain-specific insights. While GPT models demonstrated strong zero-shot and fine-tuned accuracy via OpenAI API Platform, LLaMA achieved better results in supervised classification settings. The performance highlighted the impact of manufacturer-specific nomenclature and the importance of both structured and free-text DICOM fields on model’s ability to extract the correct information. Finally, the work explored an extension of the framework utilizing the multimodal MedGemma model in a supervised learning environment, integrating imaging pixel data with textual metadata. Overall, this thesis demonstrated that LLMs could efficiently learn relevant imaging semantics from DICOM metadata alone, preserving the privacy and providing a scalable approach to metadata enrichment in radiology workflow.
Multi-Task Automated Classification of Imaging Acquisition Labels from DICOM Metadata using Large Language Models
VIAN, BEATRICE
2025/2026
Abstract
Medical imaging data is embedded in DICOM metadata, which describe acquisition characteristics and scanner-specific parameters. While these metadata are fundamental for data downstream analysis in clinical field, especially in radiology, they are often incomplete and inconsistently populated, also due to the presence of different vendors conventions. This could introduce significant variability in features and hinder the interpretation of imaging semantics. This project investigated the feasibility of predicting and standardizing MRI and PET acquisition labels directly from heterogeneous textual DICOM header metadata exploiting Large Language Models (LLMs). It focused on the automatic classification of attributes that regard image formation, specifically sequence and contrast type, acquisition plane, modality, specific tracer, manufacturer, device model and the anatomical region of the body acquired. LLMs are well suited to this task because they can robustly parse various textual inputs, comprehending both partially structured and free-text DICOM attributes, and mapping them to a standardized and controlled label space. The study consisted of a series of experiments testing how well both general purpose GPT-based and open-source LLaMA 3 models could classify metadata. Firstly, an out-of-the-box (OOB) approach set a baseline performance. Subsequently, fine-tuning was performed for both GPT and LLaMA models to improve results by supervising the learning process with domain-specific insights. While GPT models demonstrated strong zero-shot and fine-tuned accuracy via OpenAI API Platform, LLaMA achieved better results in supervised classification settings. The performance highlighted the impact of manufacturer-specific nomenclature and the importance of both structured and free-text DICOM fields on model’s ability to extract the correct information. Finally, the work explored an extension of the framework utilizing the multimodal MedGemma model in a supervised learning environment, integrating imaging pixel data with textual metadata. Overall, this thesis demonstrated that LLMs could efficiently learn relevant imaging semantics from DICOM metadata alone, preserving the privacy and providing a scalable approach to metadata enrichment in radiology workflow.| File | Dimensione | Formato | |
|---|---|---|---|
|
Vian_Beatrice.pdf
accesso aperto
Dimensione
6.31 MB
Formato
Adobe PDF
|
6.31 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/107655