This thesis presents the design, implementation, and cybersecurity evaluation of a secure, onpremise Retrieval-Augmented Generation (RAG) conversational agent tailored for a museum environment. To ensure data sovereignty and operate within a strict 36GB VRAM hardware constraint, the open-weight Qwen3.5-35B-A3B model was selected. The system optimizes latency by decoupling offline document ingestion from real-time generation, employing a hybrid search pipeline that combines Maximal Marginal Relevance (MMR) with BM25, refined by a cross-encoder reranker. This architecture achieves a 72.23% retrieval success rate while maintaining a strict operational latency of 1.4 seconds per query. Crucially, deploying RAG architectures shifts traditional security defenses to a novel ”semantic perimeter”. This study conducts a rigorous threat modeling assessment, evaluating inference- and data-phase attack vectors such as micro-scale data poisoning (PoisonedRAG) , automated jailbreaking (GPTFUZZER) , and Denial of Service exploits. To secure the infrastructure, the research proposes multi-layered countermeasures, including data provenance, retrieval-native access controls, and a Dual LLM pattern
This thesis presents the design, implementation, and cybersecurity evaluation of a secure, onpremise Retrieval-Augmented Generation (RAG) conversational agent tailored for a museum environment. To ensure data sovereignty and operate within a strict 36GB VRAM hardware constraint, the open-weight Qwen3.5-35B-A3B model was selected. The system optimizes latency by decoupling offline document ingestion from real-time generation, employing a hybrid search pipeline that combines Maximal Marginal Relevance (MMR) with BM25, refined by a cross-encoder reranker. This architecture achieves a 72.23% retrieval success rate while maintaining a strict operational latency of 1.4 seconds per query. Crucially, deploying RAG architectures shifts traditional security defenses to a novel ”semantic perimeter”. This study conducts a rigorous threat modeling assessment, evaluating inference- and data-phase attack vectors such as micro-scale data poisoning (PoisonedRAG) , automated jailbreaking (GPTFUZZER) , and Denial of Service exploits. To secure the infrastructure, the research proposes multi-layered countermeasures, including data provenance, retrieval-native access controls, and a Dual LLM pattern
Building Secure Conversational Agents: Architectural Choices, Performance Evaluation, and Threat Modeling in RAG
CALIGIURI, GIORGIO
2025/2026
Abstract
This thesis presents the design, implementation, and cybersecurity evaluation of a secure, onpremise Retrieval-Augmented Generation (RAG) conversational agent tailored for a museum environment. To ensure data sovereignty and operate within a strict 36GB VRAM hardware constraint, the open-weight Qwen3.5-35B-A3B model was selected. The system optimizes latency by decoupling offline document ingestion from real-time generation, employing a hybrid search pipeline that combines Maximal Marginal Relevance (MMR) with BM25, refined by a cross-encoder reranker. This architecture achieves a 72.23% retrieval success rate while maintaining a strict operational latency of 1.4 seconds per query. Crucially, deploying RAG architectures shifts traditional security defenses to a novel ”semantic perimeter”. This study conducts a rigorous threat modeling assessment, evaluating inference- and data-phase attack vectors such as micro-scale data poisoning (PoisonedRAG) , automated jailbreaking (GPTFUZZER) , and Denial of Service exploits. To secure the infrastructure, the research proposes multi-layered countermeasures, including data provenance, retrieval-native access controls, and a Dual LLM pattern| File | Dimensione | Formato | |
|---|---|---|---|
|
Caligiuri Giorgio master thesis.pdf
Accesso riservato
Dimensione
835.31 kB
Formato
Adobe PDF
|
835.31 kB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/108077