In collaboration with Infonet Solutions SRL, this thesis delivers an offline, on-premises Retrieval-Augmented Generation (RAG) assistant that answers employees’ questions from internal MediaWiki and Znuny/OTRS databases while preserving confidentiality and data residency. An authenticated nginx gateway fronts a FastAPI service that performs dense retrieval in a vector store and local generation via Ollama (Llama 3.x Instruct). The knowledge base is built from periodic MariaDB snapshots into a normalized, schema-aware export; documents are deterministically chunked, embedded with bge-m3, and indexed with explicit references. The design prioritizes groundedness and operations: all inference is local, responses follow an “answer-from-context or refuse” policy with similarity thresholds, storage and embeddings reside on snapshot-friendly persistent volumes, and the ETL supports incremental refresh with per-citation lineage. Functional tests on representative enterprise queries indicate that the system can provide concise answers with generally correct citations and reduced time-toinformation compared to existing tools in many cases. The thesis discusses limitations and outlines future work, including SSO with group-to-filter mapping, hybrid dense+BM25 retrieval, and GPU serving. The result is a replicable, CPU-first blueprint for secure enterprise RAG over MediaWiki and Znuny/OTRS.

In collaboration with Infonet Solutions SRL, this thesis delivers an offline, on-premises Retrieval-Augmented Generation (RAG) assistant that answers employees’ questions from internal MediaWiki and Znuny/OTRS databases while preserving confidentiality and data residency. An authenticated nginx gateway fronts a FastAPI service that performs dense retrieval in a vector store and local generation via Ollama (Llama 3.x Instruct). The knowledge base is built from periodic MariaDB snapshots into a normalized, schema-aware export; documents are deterministically chunked, embedded with bge-m3, and indexed with explicit references. The design prioritizes groundedness and operations: all inference is local, responses follow an “answer-from-context or refuse” policy with similarity thresholds, storage and embeddings reside on snapshot-friendly persistent volumes, and the ETL supports incremental refresh with per-citation lineage. Functional tests on representative enterprise queries indicate that the system can provide concise answers with generally correct citations and reduced time-toinformation compared to existing tools in many cases. The thesis discusses limitations and outlines future work, including SSO with group-to-filter mapping, hybrid dense+BM25 retrieval, and GPU serving. The result is a replicable, CPU-first blueprint for secure enterprise RAG over MediaWiki and Znuny/OTRS.

Offline LLM‑Powered RAG Chatbot for Enterprise Knowledge Retrieval: Design and Evaluation

HABIBI, HANNANE
2024/2025

Abstract

In collaboration with Infonet Solutions SRL, this thesis delivers an offline, on-premises Retrieval-Augmented Generation (RAG) assistant that answers employees’ questions from internal MediaWiki and Znuny/OTRS databases while preserving confidentiality and data residency. An authenticated nginx gateway fronts a FastAPI service that performs dense retrieval in a vector store and local generation via Ollama (Llama 3.x Instruct). The knowledge base is built from periodic MariaDB snapshots into a normalized, schema-aware export; documents are deterministically chunked, embedded with bge-m3, and indexed with explicit references. The design prioritizes groundedness and operations: all inference is local, responses follow an “answer-from-context or refuse” policy with similarity thresholds, storage and embeddings reside on snapshot-friendly persistent volumes, and the ETL supports incremental refresh with per-citation lineage. Functional tests on representative enterprise queries indicate that the system can provide concise answers with generally correct citations and reduced time-toinformation compared to existing tools in many cases. The thesis discusses limitations and outlines future work, including SSO with group-to-filter mapping, hybrid dense+BM25 retrieval, and GPU serving. The result is a replicable, CPU-first blueprint for secure enterprise RAG over MediaWiki and Znuny/OTRS.
2024
Offline LLM‑Powered RAG Chatbot for Enterprise Knowledge Retrieval: Design and Evaluation
In collaboration with Infonet Solutions SRL, this thesis delivers an offline, on-premises Retrieval-Augmented Generation (RAG) assistant that answers employees’ questions from internal MediaWiki and Znuny/OTRS databases while preserving confidentiality and data residency. An authenticated nginx gateway fronts a FastAPI service that performs dense retrieval in a vector store and local generation via Ollama (Llama 3.x Instruct). The knowledge base is built from periodic MariaDB snapshots into a normalized, schema-aware export; documents are deterministically chunked, embedded with bge-m3, and indexed with explicit references. The design prioritizes groundedness and operations: all inference is local, responses follow an “answer-from-context or refuse” policy with similarity thresholds, storage and embeddings reside on snapshot-friendly persistent volumes, and the ETL supports incremental refresh with per-citation lineage. Functional tests on representative enterprise queries indicate that the system can provide concise answers with generally correct citations and reduced time-toinformation compared to existing tools in many cases. The thesis discusses limitations and outlines future work, including SSO with group-to-filter mapping, hybrid dense+BM25 retrieval, and GPU serving. The result is a replicable, CPU-first blueprint for secure enterprise RAG over MediaWiki and Znuny/OTRS.
Large Language Model
RAG
Chatbot
LLM
Vector Search
File in questo prodotto:
File Dimensione Formato  
Habibi_Hannane.pdf

accesso aperto

Dimensione 627.93 kB
Formato Adobe PDF
627.93 kB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/102086