Applications of Large Language Models in Document Analysis and Automation

This thesis explores the application of Large Language Models (LLMs), such as GPT and Gem- ini, to automate the transcription of medical reports. Developed in collaboration with Elty DaVinci, the project aims to address inefficiencies in manual data transcription processes that are both time-intensive and error-prone. By integrating LLM APIs and leveraging a manually curated ground truth, the system ensures accurate extraction and transcription of key clinical values. Using LOINC codes as a standard for semantic consistency, the outputs are made in- teroperable across healthcare systems. The methodology includes: • API integration for data extraction. • Experiment setup to validate outputs against a ground truth. • A comparison framework to evaluate model accuracy. • Data preprocessing techniques, such as converting PDF reports into images for better extraction. Results highlight significant reductions in transcription time and improvements in accuracy compared to traditional methods. Future prospects include the implementation of predictive analytics to support early diagnosis and trend analysis in patient data. This work contributes to advancing digital health solutions by streamlining clinical workflows and enhancing data quality.

Applications of Large Language Models in Document Analysis and Automation

MAZZA, DAVIDE

2024/2025

Abstract

This thesis explores the application of Large Language Models (LLMs), such as GPT and Gem- ini, to automate the transcription of medical reports. Developed in collaboration with Elty DaVinci, the project aims to address inefficiencies in manual data transcription processes that are both time-intensive and error-prone. By integrating LLM APIs and leveraging a manually curated ground truth, the system ensures accurate extraction and transcription of key clinical values. Using LOINC codes as a standard for semantic consistency, the outputs are made in- teroperable across healthcare systems. The methodology includes: • API integration for data extraction. • Experiment setup to validate outputs against a ground truth. • A comparison framework to evaluate model accuracy. • Data preprocessing techniques, such as converting PDF reports into images for better extraction. Results highlight significant reductions in transcription time and improvements in accuracy compared to traditional methods. Future prospects include the implementation of predictive analytics to support early diagnosis and trend analysis in patient data. This work contributes to advancing digital health solutions by streamlining clinical workflows and enhancing data quality.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				DATA SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Applications of Large Language Models in Document Analysis and Automation
			
	Abstract in italiano
	
				This thesis explores the application of Large Language Models (LLMs), such as GPT and Gem-
ini, to automate the transcription of medical reports. Developed in collaboration with Elty

DaVinci, the project aims to address inefficiencies in manual data transcription processes that
are both time-intensive and error-prone. By integrating LLM APIs and leveraging a manually
curated ground truth, the system ensures accurate extraction and transcription of key clinical

values. Using LOINC codes as a standard for semantic consistency, the outputs are made in-
teroperable across healthcare systems.

The methodology includes:
• API integration for data extraction.
• Experiment setup to validate outputs against a ground truth.
• A comparison framework to evaluate model accuracy.
• Data preprocessing techniques, such as converting PDF reports into images for better
extraction.
Results highlight significant reductions in transcription time and improvements in accuracy
compared to traditional methods. Future prospects include the implementation of predictive
analytics to support early diagnosis and trend analysis in patient data. This work contributes
to advancing digital health solutions by streamlining clinical workflows and enhancing data
quality.
			
	Parola chiave
	
				Large Language Model
Automated Workflows
RAG
			
	Relatore
	
				SUSTO, GIAN ANTONIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Davide_Mazza_Tesi (2) (1).pdf Accesso riservato Dimensione 1.81 MB Formato Adobe PDF	1.81 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/84787