Offline LLM‑Powered RAG Chatbot for Enterprise Knowledge Retrieval: Design and Evaluation

In collaboration with Infonet Solutions SRL, this thesis delivers an offline, on-premises Retrieval-Augmented Generation (RAG) assistant that answers employees’ questions from internal MediaWiki and Znuny/OTRS databases while preserving confidentiality and data residency. An authenticated nginx gateway fronts a FastAPI service that performs dense retrieval in a vector store and local generation via Ollama (Llama 3.x Instruct). The knowledge base is built from periodic MariaDB snapshots into a normalized, schema-aware export; documents are deterministically chunked, embedded with bge-m3, and indexed with explicit references. The design prioritizes groundedness and operations: all inference is local, responses follow an “answer-from-context or refuse” policy with similarity thresholds, storage and embeddings reside on snapshot-friendly persistent volumes, and the ETL supports incremental refresh with per-citation lineage. Functional tests on representative enterprise queries indicate that the system can provide concise answers with generally correct citations and reduced time-toinformation compared to existing tools in many cases. The thesis discusses limitations and outlines future work, including SSO with group-to-filter mapping, hybrid dense+BM25 retrieval, and GPU serving. The result is a replicable, CPU-first blueprint for secure enterprise RAG over MediaWiki and Znuny/OTRS.

Offline LLM‑Powered RAG Chatbot for Enterprise Knowledge Retrieval: Design and Evaluation

HABIBI, HANNANE

2024/2025

Abstract

In collaboration with Infonet Solutions SRL, this thesis delivers an offline, on-premises Retrieval-Augmented Generation (RAG) assistant that answers employees’ questions from internal MediaWiki and Znuny/OTRS databases while preserving confidentiality and data residency. An authenticated nginx gateway fronts a FastAPI service that performs dense retrieval in a vector store and local generation via Ollama (Llama 3.x Instruct). The knowledge base is built from periodic MariaDB snapshots into a normalized, schema-aware export; documents are deterministically chunked, embedded with bge-m3, and indexed with explicit references. The design prioritizes groundedness and operations: all inference is local, responses follow an “answer-from-context or refuse” policy with similarity thresholds, storage and embeddings reside on snapshot-friendly persistent volumes, and the ETL supports incremental refresh with per-citation lineage. Functional tests on representative enterprise queries indicate that the system can provide concise answers with generally correct citations and reduced time-toinformation compared to existing tools in many cases. The thesis discusses limitations and outlines future work, including SSO with group-to-filter mapping, hybrid dense+BM25 retrieval, and GPU serving. The result is a replicable, CPU-first blueprint for secure enterprise RAG over MediaWiki and Znuny/OTRS.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				COMPUTER SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Offline LLM‑Powered RAG Chatbot for Enterprise Knowledge Retrieval: Design and Evaluation
			
	Abstract in italiano
	
				In collaboration with Infonet Solutions SRL, this thesis delivers an offline, on-premises Retrieval-Augmented Generation (RAG) assistant that answers employees’ questions from internal MediaWiki and Znuny/OTRS databases while preserving confidentiality and data residency. An authenticated nginx gateway fronts a FastAPI service that performs dense retrieval in a vector store and local generation via Ollama (Llama 3.x Instruct). The knowledge base is built from periodic MariaDB snapshots into a normalized, schema-aware export; documents are deterministically chunked, embedded with bge-m3, and indexed with explicit references. 
The design prioritizes groundedness and operations: all inference is local, responses follow an “answer-from-context or refuse” policy with similarity thresholds, storage and embeddings reside on snapshot-friendly persistent volumes, and the ETL supports incremental refresh with per-citation lineage. Functional tests on representative enterprise queries indicate that the system can provide concise answers with generally correct citations and reduced time-toinformation compared to existing tools in many cases. The thesis discusses limitations and outlines future work, including SSO with group-to-filter mapping, hybrid dense+BM25 retrieval, and GPU serving. The result is a replicable, CPU-first blueprint for secure enterprise RAG over MediaWiki and Znuny/OTRS.
			
	Parola chiave
	
				Large Language Model
RAG
Chatbot
LLM
Vector Search
			
	Relatore
	
				MARCHIORO, THOMAS
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Habibi_Hannane.pdf accesso aperto Dimensione 627.93 kB Formato Adobe PDF Visualizza/Apri	627.93 kB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/102086