Enhancing Corporate Document Management Systems with AI: Leveraging Embeddings and Semantic Search for Efficient Retrieval and Chat-based Assistance

Recent advancements in large language models (LLMs) and Retrieval-Augmented Gener- ation (RAG) techniques have revolutionized information management across industries, enabling the development of intelligent, efficient, and contextually-aware document in- teraction systems. This thesis explores the application of these technologies within the domain of corporate document management, presenting a novel chatbot solution designed to enhance document retrieval and provide accurate, context-driven assistance to users. Building on this robust retrieval framework, the Retrieval-Augmented Generation (RAG) approach is used to integrate the best embedding model with five distinct LLMs. These models generate context-aware responses, with their performance evaluated based on response quality and alignment with user expectations. Metrics such as generation time are also analyzed to assess system efficiency. This thesis demonstrates how the integration of LLMs, RAG, and advanced embedding techniques can transform corporate document management by providing reliable, scalable, and responsive access to knowledge. By detailing the system’s architecture, methodology, evaluative metrics, and performance benchmarks, this work highlights the potential for deploying LLM-powered, RAG-driven solutions that enable efficient and contextually relevant user interactions across industries.

Enhancing Corporate Document Management Systems with AI: Leveraging Embeddings and Semantic Search for Efficient Retrieval and Chat-based Assistance

GHORBANI, NAZANIN

2023/2024

Abstract

Recent advancements in large language models (LLMs) and Retrieval-Augmented Gener- ation (RAG) techniques have revolutionized information management across industries, enabling the development of intelligent, efficient, and contextually-aware document in- teraction systems. This thesis explores the application of these technologies within the domain of corporate document management, presenting a novel chatbot solution designed to enhance document retrieval and provide accurate, context-driven assistance to users. Building on this robust retrieval framework, the Retrieval-Augmented Generation (RAG) approach is used to integrate the best embedding model with five distinct LLMs. These models generate context-aware responses, with their performance evaluated based on response quality and alignment with user expectations. Metrics such as generation time are also analyzed to assess system efficiency. This thesis demonstrates how the integration of LLMs, RAG, and advanced embedding techniques can transform corporate document management by providing reliable, scalable, and responsive access to knowledge. By detailing the system’s architecture, methodology, evaluative metrics, and performance benchmarks, this work highlights the potential for deploying LLM-powered, RAG-driven solutions that enable efficient and contextually relevant user interactions across industries.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				COMPUTER SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Enhancing Corporate Document Management Systems with AI: Leveraging Embeddings and Semantic Search for Efficient Retrieval and Chat-based Assistance
			
	Parola chiave
	
				RAG systems
Embedding Models
LLMs
Semantic Search
Vector Databases
			
	Relatore
	
				AIOLLI, FABIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
thesis_nazanin_ghorbani.pdf accesso aperto Dimensione 4.95 MB Formato Adobe PDF Visualizza/Apri	4.95 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/80201