This thesis explores the application of Large Language Models (LLMs), such as GPT and Gem- ini, to automate the transcription of medical reports. Developed in collaboration with Elty DaVinci, the project aims to address inefficiencies in manual data transcription processes that are both time-intensive and error-prone. By integrating LLM APIs and leveraging a manually curated ground truth, the system ensures accurate extraction and transcription of key clinical values. Using LOINC codes as a standard for semantic consistency, the outputs are made in- teroperable across healthcare systems. The methodology includes: • API integration for data extraction. • Experiment setup to validate outputs against a ground truth. • A comparison framework to evaluate model accuracy. • Data preprocessing techniques, such as converting PDF reports into images for better extraction. Results highlight significant reductions in transcription time and improvements in accuracy compared to traditional methods. Future prospects include the implementation of predictive analytics to support early diagnosis and trend analysis in patient data. This work contributes to advancing digital health solutions by streamlining clinical workflows and enhancing data quality.

This thesis explores the application of Large Language Models (LLMs), such as GPT and Gem- ini, to automate the transcription of medical reports. Developed in collaboration with Elty DaVinci, the project aims to address inefficiencies in manual data transcription processes that are both time-intensive and error-prone. By integrating LLM APIs and leveraging a manually curated ground truth, the system ensures accurate extraction and transcription of key clinical values. Using LOINC codes as a standard for semantic consistency, the outputs are made in- teroperable across healthcare systems. The methodology includes: • API integration for data extraction. • Experiment setup to validate outputs against a ground truth. • A comparison framework to evaluate model accuracy. • Data preprocessing techniques, such as converting PDF reports into images for better extraction. Results highlight significant reductions in transcription time and improvements in accuracy compared to traditional methods. Future prospects include the implementation of predictive analytics to support early diagnosis and trend analysis in patient data. This work contributes to advancing digital health solutions by streamlining clinical workflows and enhancing data quality.

Applications of Large Language Models in Document Analysis and Automation

MAZZA, DAVIDE
2024/2025

Abstract

This thesis explores the application of Large Language Models (LLMs), such as GPT and Gem- ini, to automate the transcription of medical reports. Developed in collaboration with Elty DaVinci, the project aims to address inefficiencies in manual data transcription processes that are both time-intensive and error-prone. By integrating LLM APIs and leveraging a manually curated ground truth, the system ensures accurate extraction and transcription of key clinical values. Using LOINC codes as a standard for semantic consistency, the outputs are made in- teroperable across healthcare systems. The methodology includes: • API integration for data extraction. • Experiment setup to validate outputs against a ground truth. • A comparison framework to evaluate model accuracy. • Data preprocessing techniques, such as converting PDF reports into images for better extraction. Results highlight significant reductions in transcription time and improvements in accuracy compared to traditional methods. Future prospects include the implementation of predictive analytics to support early diagnosis and trend analysis in patient data. This work contributes to advancing digital health solutions by streamlining clinical workflows and enhancing data quality.
2024
Applications of Large Language Models in Document Analysis and Automation
This thesis explores the application of Large Language Models (LLMs), such as GPT and Gem- ini, to automate the transcription of medical reports. Developed in collaboration with Elty DaVinci, the project aims to address inefficiencies in manual data transcription processes that are both time-intensive and error-prone. By integrating LLM APIs and leveraging a manually curated ground truth, the system ensures accurate extraction and transcription of key clinical values. Using LOINC codes as a standard for semantic consistency, the outputs are made in- teroperable across healthcare systems. The methodology includes: • API integration for data extraction. • Experiment setup to validate outputs against a ground truth. • A comparison framework to evaluate model accuracy. • Data preprocessing techniques, such as converting PDF reports into images for better extraction. Results highlight significant reductions in transcription time and improvements in accuracy compared to traditional methods. Future prospects include the implementation of predictive analytics to support early diagnosis and trend analysis in patient data. This work contributes to advancing digital health solutions by streamlining clinical workflows and enhancing data quality.
Large Language Model
Automated Workflows
RAG
File in questo prodotto:
File Dimensione Formato  
Davide_Mazza_Tesi (2) (1).pdf

Accesso riservato

Dimensione 1.81 MB
Formato Adobe PDF
1.81 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/84787