Financial reports contain vast amounts of critical information essential for stakeholders, analysts, and automated decision-making systems. Extracting accurate and structured information from these documents is crucial for enabling downstream Natural Language Processing (NLP) tasks such as question answering, fact verification, risk assessment, and automated report generation. However, the inherent complexity, extreme length, and the presence of both structured and unstructured data pose significant challenges for traditional Information Extraction (IE) methods. This thesis explores the application of advanced natural language processing techniques, specifically Retrieval-Augmented Generation (RAG) and Multimodal Large Language Models (MLLMs), to enhance the efficiency and accuracy of information extraction from these complex documents. We investigate the integration of structured retrieval mechanisms with generative language models to improve factual consistency and reduce the risk of hallucinations. Additionally, we address the limitations of text-only models by incorporating visual document understanding, enabling the models to process tabular data, charts, and layout-dependent context commonly found in financial disclosures. Through comprehensive experiments on benchmark datasets and real-world financial documents, the thesis evaluates the performance of the proposed systems against traditional approaches using metrics such as Hit Rate, Mean Reciprocal Rank (MRR), and Normalized Discounted Cumulative Gain (NDCG). Our experimental results suggest that integrating RAG with MLLMs substantially improves quality and reliability of extracted information. This, in turn, leads to enhanced performance in domain-specific applications such as document question answering, where the precision and contextual accuracy of the extracted content are critical. This research contributes to the development of more scalable, robust, and intelligent systems for financial document analysis and lays a foundation for future advancements in automated understanding and reasoning over complex financial documents.

Financial reports contain vast amounts of critical information essential for stakeholders, analysts, and automated decision-making systems. Extracting accurate and structured information from these documents is crucial for enabling downstream Natural Language Processing (NLP) tasks such as question answering, fact verification, risk assessment, and automated report generation. However, the inherent complexity, extreme length, and the presence of both structured and unstructured data pose significant challenges for traditional Information Extraction (IE) methods. This thesis explores the application of advanced natural language processing techniques, specifically Retrieval-Augmented Generation (RAG) and Multimodal Large Language Models (MLLMs), to enhance the efficiency and accuracy of information extraction from these complex documents. We investigate the integration of structured retrieval mechanisms with generative language models to improve factual consistency and reduce the risk of hallucinations. Additionally, we address the limitations of text-only models by incorporating visual document understanding, enabling the models to process tabular data, charts, and layout-dependent context commonly found in financial disclosures. Through comprehensive experiments on benchmark datasets and real-world financial documents, the thesis evaluates the performance of the proposed systems against traditional approaches using metrics such as Hit Rate, Mean Reciprocal Rank (MRR), and Normalized Discounted Cumulative Gain (NDCG). Our experimental results suggest that integrating RAG with MLLMs substantially improves quality and reliability of extracted information. This, in turn, leads to enhanced performance in domain-specific applications such as document question answering, where the precision and contextual accuracy of the extracted content are critical. This research contributes to the development of more scalable, robust, and intelligent systems for financial document analysis and lays a foundation for future advancements in automated understanding and reasoning over complex financial documents.

Information Extraction from Financial Reports with Retrieval-Augmented Generation and Multimodal Large Language Models

COMIN, MASSIMILIANO
2024/2025

Abstract

Financial reports contain vast amounts of critical information essential for stakeholders, analysts, and automated decision-making systems. Extracting accurate and structured information from these documents is crucial for enabling downstream Natural Language Processing (NLP) tasks such as question answering, fact verification, risk assessment, and automated report generation. However, the inherent complexity, extreme length, and the presence of both structured and unstructured data pose significant challenges for traditional Information Extraction (IE) methods. This thesis explores the application of advanced natural language processing techniques, specifically Retrieval-Augmented Generation (RAG) and Multimodal Large Language Models (MLLMs), to enhance the efficiency and accuracy of information extraction from these complex documents. We investigate the integration of structured retrieval mechanisms with generative language models to improve factual consistency and reduce the risk of hallucinations. Additionally, we address the limitations of text-only models by incorporating visual document understanding, enabling the models to process tabular data, charts, and layout-dependent context commonly found in financial disclosures. Through comprehensive experiments on benchmark datasets and real-world financial documents, the thesis evaluates the performance of the proposed systems against traditional approaches using metrics such as Hit Rate, Mean Reciprocal Rank (MRR), and Normalized Discounted Cumulative Gain (NDCG). Our experimental results suggest that integrating RAG with MLLMs substantially improves quality and reliability of extracted information. This, in turn, leads to enhanced performance in domain-specific applications such as document question answering, where the precision and contextual accuracy of the extracted content are critical. This research contributes to the development of more scalable, robust, and intelligent systems for financial document analysis and lays a foundation for future advancements in automated understanding and reasoning over complex financial documents.
2024
Information Extraction from Financial Reports with Retrieval-Augmented Generation and Multimodal Large Language Models
Financial reports contain vast amounts of critical information essential for stakeholders, analysts, and automated decision-making systems. Extracting accurate and structured information from these documents is crucial for enabling downstream Natural Language Processing (NLP) tasks such as question answering, fact verification, risk assessment, and automated report generation. However, the inherent complexity, extreme length, and the presence of both structured and unstructured data pose significant challenges for traditional Information Extraction (IE) methods. This thesis explores the application of advanced natural language processing techniques, specifically Retrieval-Augmented Generation (RAG) and Multimodal Large Language Models (MLLMs), to enhance the efficiency and accuracy of information extraction from these complex documents. We investigate the integration of structured retrieval mechanisms with generative language models to improve factual consistency and reduce the risk of hallucinations. Additionally, we address the limitations of text-only models by incorporating visual document understanding, enabling the models to process tabular data, charts, and layout-dependent context commonly found in financial disclosures. Through comprehensive experiments on benchmark datasets and real-world financial documents, the thesis evaluates the performance of the proposed systems against traditional approaches using metrics such as Hit Rate, Mean Reciprocal Rank (MRR), and Normalized Discounted Cumulative Gain (NDCG). Our experimental results suggest that integrating RAG with MLLMs substantially improves quality and reliability of extracted information. This, in turn, leads to enhanced performance in domain-specific applications such as document question answering, where the precision and contextual accuracy of the extracted content are critical. This research contributes to the development of more scalable, robust, and intelligent systems for financial document analysis and lays a foundation for future advancements in automated understanding and reasoning over complex financial documents.
LLM
RAG
Multimodal Models
Data Extraction
Financial Documents
File in questo prodotto:
File Dimensione Formato  
Comin_Massimiliano.pdf

embargo fino al 10/07/2026

Dimensione 9.66 MB
Formato Adobe PDF
9.66 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/87076