This thesis presents the design, implementation, and evaluation of a custom Retrieval-Augmented Generation (RAG) system for Open-Domain Question Answering (QA). The work bases its foundations from the TREC RAG 2024 Track and focuses on combining both traditional and innovative information retrieval methods with local large language models (LLMs) to build a flexible and efficient end-to-end pipeline. The retrieval component is based on the Pyserini framework, using datasets and evaluation data and tools provided by TREC (e.g. MS MARCO Segment v2.1 collection): various retrieval strategies were explored, including BM25, query expansion, pseudo-relevance feedback, and re-ranking techniques. For the generation component, multiple local LLMs were tested under different prompting strategies and configurations, with particular attention to performance optimization through quantization, GPU acceleration, and fine-tuning. Results were then compared with outputs from state-of-the-art hosted LLMs to assess relative quality and performance. Additionally, a preliminary experiment with a Parametric RAG approach (PRAG), a new approach to RAG presented in a recently published paper (January 2025) where context is integrated as model parameters instead of prompt inputs (or both), is introduced. The results highlight how different combinations of retrieval and generation techniques impact the relevance and quality of the final answers: this experimental study contributes to the practical understanding of building customized, efficient, and interpretable RAG systems using open-source tools and local models.

Experimental Study on Retrieval-Augmented Generation: Engineering and Evaluation of a Custom RAG system for Open-Domain QA

ANTOLINI, GIANLUCA
2024/2025

Abstract

This thesis presents the design, implementation, and evaluation of a custom Retrieval-Augmented Generation (RAG) system for Open-Domain Question Answering (QA). The work bases its foundations from the TREC RAG 2024 Track and focuses on combining both traditional and innovative information retrieval methods with local large language models (LLMs) to build a flexible and efficient end-to-end pipeline. The retrieval component is based on the Pyserini framework, using datasets and evaluation data and tools provided by TREC (e.g. MS MARCO Segment v2.1 collection): various retrieval strategies were explored, including BM25, query expansion, pseudo-relevance feedback, and re-ranking techniques. For the generation component, multiple local LLMs were tested under different prompting strategies and configurations, with particular attention to performance optimization through quantization, GPU acceleration, and fine-tuning. Results were then compared with outputs from state-of-the-art hosted LLMs to assess relative quality and performance. Additionally, a preliminary experiment with a Parametric RAG approach (PRAG), a new approach to RAG presented in a recently published paper (January 2025) where context is integrated as model parameters instead of prompt inputs (or both), is introduced. The results highlight how different combinations of retrieval and generation techniques impact the relevance and quality of the final answers: this experimental study contributes to the practical understanding of building customized, efficient, and interpretable RAG systems using open-source tools and local models.
2024
Experimental Study on Retrieval-Augmented Generation: Engineering and Evaluation of a Custom RAG system for Open-Domain QA
RAG
IR
LLM
File in questo prodotto:
File Dimensione Formato  
main.pdf

accesso aperto

Dimensione 4.63 MB
Formato Adobe PDF
4.63 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/86949