In collaboration with Infonet Solutions SRL, this thesis delivers an offline, on-premises Retrieval-Augmented Generation (RAG) assistant that answers employees’ questions from internal MediaWiki and Znuny/OTRS databases while preserving confidentiality and data residency. An authenticated nginx gateway fronts a FastAPI service that performs dense retrieval in a vector store and local generation via Ollama (Llama 3.x Instruct). The knowledge base is built from periodic MariaDB snapshots into a normalized, schema-aware export; documents are deterministically chunked, embedded with bge-m3, and indexed with explicit references. The design prioritizes groundedness and operations: all inference is local, responses follow an “answer-from-context or refuse” policy with similarity thresholds, storage and embeddings reside on snapshot-friendly persistent volumes, and the ETL supports incremental refresh with per-citation lineage. Functional tests on representative enterprise queries indicate that the system can provide concise answers with generally correct citations and reduced time-toinformation compared to existing tools in many cases. The thesis discusses limitations and outlines future work, including SSO with group-to-filter mapping, hybrid dense+BM25 retrieval, and GPU serving. The result is a replicable, CPU-first blueprint for secure enterprise RAG over MediaWiki and Znuny/OTRS.
In collaboration with Infonet Solutions SRL, this thesis delivers an offline, on-premises Retrieval-Augmented Generation (RAG) assistant that answers employees’ questions from internal MediaWiki and Znuny/OTRS databases while preserving confidentiality and data residency. An authenticated nginx gateway fronts a FastAPI service that performs dense retrieval in a vector store and local generation via Ollama (Llama 3.x Instruct). The knowledge base is built from periodic MariaDB snapshots into a normalized, schema-aware export; documents are deterministically chunked, embedded with bge-m3, and indexed with explicit references. The design prioritizes groundedness and operations: all inference is local, responses follow an “answer-from-context or refuse” policy with similarity thresholds, storage and embeddings reside on snapshot-friendly persistent volumes, and the ETL supports incremental refresh with per-citation lineage. Functional tests on representative enterprise queries indicate that the system can provide concise answers with generally correct citations and reduced time-toinformation compared to existing tools in many cases. The thesis discusses limitations and outlines future work, including SSO with group-to-filter mapping, hybrid dense+BM25 retrieval, and GPU serving. The result is a replicable, CPU-first blueprint for secure enterprise RAG over MediaWiki and Znuny/OTRS.
Offline LLM‑Powered RAG Chatbot for Enterprise Knowledge Retrieval: Design and Evaluation
HABIBI, HANNANE
2024/2025
Abstract
In collaboration with Infonet Solutions SRL, this thesis delivers an offline, on-premises Retrieval-Augmented Generation (RAG) assistant that answers employees’ questions from internal MediaWiki and Znuny/OTRS databases while preserving confidentiality and data residency. An authenticated nginx gateway fronts a FastAPI service that performs dense retrieval in a vector store and local generation via Ollama (Llama 3.x Instruct). The knowledge base is built from periodic MariaDB snapshots into a normalized, schema-aware export; documents are deterministically chunked, embedded with bge-m3, and indexed with explicit references. The design prioritizes groundedness and operations: all inference is local, responses follow an “answer-from-context or refuse” policy with similarity thresholds, storage and embeddings reside on snapshot-friendly persistent volumes, and the ETL supports incremental refresh with per-citation lineage. Functional tests on representative enterprise queries indicate that the system can provide concise answers with generally correct citations and reduced time-toinformation compared to existing tools in many cases. The thesis discusses limitations and outlines future work, including SSO with group-to-filter mapping, hybrid dense+BM25 retrieval, and GPU serving. The result is a replicable, CPU-first blueprint for secure enterprise RAG over MediaWiki and Znuny/OTRS.| File | Dimensione | Formato | |
|---|---|---|---|
|
Habibi_Hannane.pdf
accesso aperto
Dimensione
627.93 kB
Formato
Adobe PDF
|
627.93 kB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/102086