Financial reports contain vast amounts of critical information essential for stakeholders, analysts, and automated decision-making systems. Extracting accurate and structured information from these documents is crucial for enabling downstream Natural Language Processing (NLP) tasks such as question answering, fact verification, risk assessment, and automated report generation. However, the inherent complexity, extreme length, and the presence of both structured and unstructured data pose significant challenges for traditional Information Extraction (IE) methods. This thesis explores the application of advanced natural language processing techniques, specifically Retrieval-Augmented Generation (RAG) and Multimodal Large Language Models (MLLMs), to enhance the efficiency and accuracy of information extraction from these complex documents. We investigate the integration of structured retrieval mechanisms with generative language models to improve factual consistency and reduce the risk of hallucinations. Additionally, we address the limitations of text-only models by incorporating visual document understanding, enabling the models to process tabular data, charts, and layout-dependent context commonly found in financial disclosures. Through comprehensive experiments on benchmark datasets and real-world financial documents, the thesis evaluates the performance of the proposed systems against traditional approaches using metrics such as Hit Rate, Mean Reciprocal Rank (MRR), and Normalized Discounted Cumulative Gain (NDCG). Our experimental results suggest that integrating RAG with MLLMs substantially improves quality and reliability of extracted information. This, in turn, leads to enhanced performance in domain-specific applications such as document question answering, where the precision and contextual accuracy of the extracted content are critical. This research contributes to the development of more scalable, robust, and intelligent systems for financial document analysis and lays a foundation for future advancements in automated understanding and reasoning over complex financial documents.
Financial reports contain vast amounts of critical information essential for stakeholders, analysts, and automated decision-making systems. Extracting accurate and structured information from these documents is crucial for enabling downstream Natural Language Processing (NLP) tasks such as question answering, fact verification, risk assessment, and automated report generation. However, the inherent complexity, extreme length, and the presence of both structured and unstructured data pose significant challenges for traditional Information Extraction (IE) methods. This thesis explores the application of advanced natural language processing techniques, specifically Retrieval-Augmented Generation (RAG) and Multimodal Large Language Models (MLLMs), to enhance the efficiency and accuracy of information extraction from these complex documents. We investigate the integration of structured retrieval mechanisms with generative language models to improve factual consistency and reduce the risk of hallucinations. Additionally, we address the limitations of text-only models by incorporating visual document understanding, enabling the models to process tabular data, charts, and layout-dependent context commonly found in financial disclosures. Through comprehensive experiments on benchmark datasets and real-world financial documents, the thesis evaluates the performance of the proposed systems against traditional approaches using metrics such as Hit Rate, Mean Reciprocal Rank (MRR), and Normalized Discounted Cumulative Gain (NDCG). Our experimental results suggest that integrating RAG with MLLMs substantially improves quality and reliability of extracted information. This, in turn, leads to enhanced performance in domain-specific applications such as document question answering, where the precision and contextual accuracy of the extracted content are critical. This research contributes to the development of more scalable, robust, and intelligent systems for financial document analysis and lays a foundation for future advancements in automated understanding and reasoning over complex financial documents.
Information Extraction from Financial Reports with Retrieval-Augmented Generation and Multimodal Large Language Models
COMIN, MASSIMILIANO
2024/2025
Abstract
Financial reports contain vast amounts of critical information essential for stakeholders, analysts, and automated decision-making systems. Extracting accurate and structured information from these documents is crucial for enabling downstream Natural Language Processing (NLP) tasks such as question answering, fact verification, risk assessment, and automated report generation. However, the inherent complexity, extreme length, and the presence of both structured and unstructured data pose significant challenges for traditional Information Extraction (IE) methods. This thesis explores the application of advanced natural language processing techniques, specifically Retrieval-Augmented Generation (RAG) and Multimodal Large Language Models (MLLMs), to enhance the efficiency and accuracy of information extraction from these complex documents. We investigate the integration of structured retrieval mechanisms with generative language models to improve factual consistency and reduce the risk of hallucinations. Additionally, we address the limitations of text-only models by incorporating visual document understanding, enabling the models to process tabular data, charts, and layout-dependent context commonly found in financial disclosures. Through comprehensive experiments on benchmark datasets and real-world financial documents, the thesis evaluates the performance of the proposed systems against traditional approaches using metrics such as Hit Rate, Mean Reciprocal Rank (MRR), and Normalized Discounted Cumulative Gain (NDCG). Our experimental results suggest that integrating RAG with MLLMs substantially improves quality and reliability of extracted information. This, in turn, leads to enhanced performance in domain-specific applications such as document question answering, where the precision and contextual accuracy of the extracted content are critical. This research contributes to the development of more scalable, robust, and intelligent systems for financial document analysis and lays a foundation for future advancements in automated understanding and reasoning over complex financial documents.| File | Dimensione | Formato | |
|---|---|---|---|
|
Comin_Massimiliano.pdf
embargo fino al 10/07/2026
Dimensione
9.66 MB
Formato
Adobe PDF
|
9.66 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/87076