Information Extraction from Financial Reports with Retrieval-Augmented Generation and Multimodal Large Language Models

Financial reports contain vast amounts of critical information essential for stakeholders, analysts, and automated decision-making systems. Extracting accurate and structured information from these documents is crucial for enabling downstream Natural Language Processing (NLP) tasks such as question answering, fact verification, risk assessment, and automated report generation. However, the inherent complexity, extreme length, and the presence of both structured and unstructured data pose significant challenges for traditional Information Extraction (IE) methods. This thesis explores the application of advanced natural language processing techniques, specifically Retrieval-Augmented Generation (RAG) and Multimodal Large Language Models (MLLMs), to enhance the efficiency and accuracy of information extraction from these complex documents. We investigate the integration of structured retrieval mechanisms with generative language models to improve factual consistency and reduce the risk of hallucinations. Additionally, we address the limitations of text-only models by incorporating visual document understanding, enabling the models to process tabular data, charts, and layout-dependent context commonly found in financial disclosures. Through comprehensive experiments on benchmark datasets and real-world financial documents, the thesis evaluates the performance of the proposed systems against traditional approaches using metrics such as Hit Rate, Mean Reciprocal Rank (MRR), and Normalized Discounted Cumulative Gain (NDCG). Our experimental results suggest that integrating RAG with MLLMs substantially improves quality and reliability of extracted information. This, in turn, leads to enhanced performance in domain-specific applications such as document question answering, where the precision and contextual accuracy of the extracted content are critical. This research contributes to the development of more scalable, robust, and intelligent systems for financial document analysis and lays a foundation for future advancements in automated understanding and reasoning over complex financial documents.

Information Extraction from Financial Reports with Retrieval-Augmented Generation and Multimodal Large Language Models

COMIN, MASSIMILIANO

2024/2025

Abstract

Financial reports contain vast amounts of critical information essential for stakeholders, analysts, and automated decision-making systems. Extracting accurate and structured information from these documents is crucial for enabling downstream Natural Language Processing (NLP) tasks such as question answering, fact verification, risk assessment, and automated report generation. However, the inherent complexity, extreme length, and the presence of both structured and unstructured data pose significant challenges for traditional Information Extraction (IE) methods. This thesis explores the application of advanced natural language processing techniques, specifically Retrieval-Augmented Generation (RAG) and Multimodal Large Language Models (MLLMs), to enhance the efficiency and accuracy of information extraction from these complex documents. We investigate the integration of structured retrieval mechanisms with generative language models to improve factual consistency and reduce the risk of hallucinations. Additionally, we address the limitations of text-only models by incorporating visual document understanding, enabling the models to process tabular data, charts, and layout-dependent context commonly found in financial disclosures. Through comprehensive experiments on benchmark datasets and real-world financial documents, the thesis evaluates the performance of the proposed systems against traditional approaches using metrics such as Hit Rate, Mean Reciprocal Rank (MRR), and Normalized Discounted Cumulative Gain (NDCG). Our experimental results suggest that integrating RAG with MLLMs substantially improves quality and reliability of extracted information. This, in turn, leads to enhanced performance in domain-specific applications such as document question answering, where the precision and contextual accuracy of the extracted content are critical. This research contributes to the development of more scalable, robust, and intelligent systems for financial document analysis and lays a foundation for future advancements in automated understanding and reasoning over complex financial documents.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Information Extraction from Financial Reports with Retrieval-Augmented Generation and Multimodal Large Language Models
			
	Abstract in italiano
	
				Financial reports contain vast amounts of critical information essential for stakeholders, analysts, and automated decision-making systems. Extracting accurate and structured information from these documents is crucial for enabling downstream Natural Language Processing (NLP) tasks such as question answering, fact verification, risk assessment, and automated report generation. However, the inherent complexity, extreme length, and the presence of both structured and unstructured data pose significant challenges for traditional Information Extraction (IE) methods. 
This thesis explores the application of advanced natural language processing techniques, specifically Retrieval-Augmented Generation (RAG) and Multimodal Large Language Models (MLLMs), to enhance the efficiency and accuracy of information extraction from these complex documents. We investigate the integration of structured retrieval mechanisms with generative language models to improve factual consistency and reduce the risk of hallucinations. Additionally, we address the limitations of text-only models by incorporating visual document understanding, enabling the models to process tabular data, charts, and layout-dependent context commonly found in financial disclosures. 
Through comprehensive experiments on benchmark datasets and real-world financial documents, the thesis evaluates the performance of the proposed systems against traditional approaches using metrics such as Hit Rate, Mean Reciprocal Rank (MRR), and Normalized Discounted Cumulative Gain (NDCG). Our experimental results suggest that integrating RAG with MLLMs substantially improves quality and reliability of extracted information. This, in turn, leads to enhanced performance in domain-specific applications such as document question answering, where the precision and contextual accuracy of the extracted content are critical.
This research contributes to the development of more scalable, robust, and intelligent systems for financial document analysis and lays a foundation for future advancements in automated understanding and reasoning over complex financial documents.
			
	Parola chiave
	
				LLM
RAG
Multimodal Models
Data Extraction
Financial Documents
			
	Relatore
	
				SATTA, GIORGIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Comin_Massimiliano.pdf Open Access dal 11/07/2026 Dimensione 9.66 MB Formato Adobe PDF Visualizza/Apri	9.66 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/87076