Recent advancements in large language models (LLMs) and Retrieval-Augmented Gener- ation (RAG) techniques have revolutionized information management across industries, enabling the development of intelligent, efficient, and contextually-aware document in- teraction systems. This thesis explores the application of these technologies within the domain of corporate document management, presenting a novel chatbot solution designed to enhance document retrieval and provide accurate, context-driven assistance to users. Building on this robust retrieval framework, the Retrieval-Augmented Generation (RAG) approach is used to integrate the best embedding model with five distinct LLMs. These models generate context-aware responses, with their performance evaluated based on response quality and alignment with user expectations. Metrics such as generation time are also analyzed to assess system efficiency. This thesis demonstrates how the integration of LLMs, RAG, and advanced embedding techniques can transform corporate document management by providing reliable, scalable, and responsive access to knowledge. By detailing the system’s architecture, methodology, evaluative metrics, and performance benchmarks, this work highlights the potential for deploying LLM-powered, RAG-driven solutions that enable efficient and contextually relevant user interactions across industries.

Enhancing Corporate Document Management Systems with AI: Leveraging Embeddings and Semantic Search for Efficient Retrieval and Chat-based Assistance

GHORBANI, NAZANIN
2023/2024

Abstract

Recent advancements in large language models (LLMs) and Retrieval-Augmented Gener- ation (RAG) techniques have revolutionized information management across industries, enabling the development of intelligent, efficient, and contextually-aware document in- teraction systems. This thesis explores the application of these technologies within the domain of corporate document management, presenting a novel chatbot solution designed to enhance document retrieval and provide accurate, context-driven assistance to users. Building on this robust retrieval framework, the Retrieval-Augmented Generation (RAG) approach is used to integrate the best embedding model with five distinct LLMs. These models generate context-aware responses, with their performance evaluated based on response quality and alignment with user expectations. Metrics such as generation time are also analyzed to assess system efficiency. This thesis demonstrates how the integration of LLMs, RAG, and advanced embedding techniques can transform corporate document management by providing reliable, scalable, and responsive access to knowledge. By detailing the system’s architecture, methodology, evaluative metrics, and performance benchmarks, this work highlights the potential for deploying LLM-powered, RAG-driven solutions that enable efficient and contextually relevant user interactions across industries.
2023
Enhancing Corporate Document Management Systems with AI: Leveraging Embeddings and Semantic Search for Efficient Retrieval and Chat-based Assistance
RAG systems
Embedding Models
LLMs
Semantic Search
Vector Databases
File in questo prodotto:
File Dimensione Formato  
thesis_nazanin_ghorbani.pdf

accesso aperto

Dimensione 4.95 MB
Formato Adobe PDF
4.95 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/80201