Building Secure Conversational Agents: Architectural Choices, Performance Evaluation, and Threat Modeling in RAG

This thesis presents the design, implementation, and cybersecurity evaluation of a secure, onpremise Retrieval-Augmented Generation (RAG) conversational agent tailored for a museum environment. To ensure data sovereignty and operate within a strict 36GB VRAM hardware constraint, the open-weight Qwen3.5-35B-A3B model was selected. The system optimizes latency by decoupling offline document ingestion from real-time generation, employing a hybrid search pipeline that combines Maximal Marginal Relevance (MMR) with BM25, refined by a cross-encoder reranker. This architecture achieves a 72.23% retrieval success rate while maintaining a strict operational latency of 1.4 seconds per query. Crucially, deploying RAG architectures shifts traditional security defenses to a novel ”semantic perimeter”. This study conducts a rigorous threat modeling assessment, evaluating inference- and data-phase attack vectors such as micro-scale data poisoning (PoisonedRAG) , automated jailbreaking (GPTFUZZER) , and Denial of Service exploits. To secure the infrastructure, the research proposes multi-layered countermeasures, including data provenance, retrieval-native access controls, and a Dual LLM pattern

Building Secure Conversational Agents: Architectural Choices, Performance Evaluation, and Threat Modeling in RAG

CALIGIURI, GIORGIO

2025/2026

Abstract

This thesis presents the design, implementation, and cybersecurity evaluation of a secure, onpremise Retrieval-Augmented Generation (RAG) conversational agent tailored for a museum environment. To ensure data sovereignty and operate within a strict 36GB VRAM hardware constraint, the open-weight Qwen3.5-35B-A3B model was selected. The system optimizes latency by decoupling offline document ingestion from real-time generation, employing a hybrid search pipeline that combines Maximal Marginal Relevance (MMR) with BM25, refined by a cross-encoder reranker. This architecture achieves a 72.23% retrieval success rate while maintaining a strict operational latency of 1.4 seconds per query. Crucially, deploying RAG architectures shifts traditional security defenses to a novel ”semantic perimeter”. This study conducts a rigorous threat modeling assessment, evaluating inference- and data-phase attack vectors such as micro-scale data poisoning (PoisonedRAG) , automated jailbreaking (GPTFUZZER) , and Denial of Service exploits. To secure the infrastructure, the research proposes multi-layered countermeasures, including data provenance, retrieval-native access controls, and a Dual LLM pattern

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				CYBERSECURITY Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2025
			
	Titolo inglese
	
				Building Secure Conversational Agents: Architectural Choices, Performance Evaluation, and Threat Modeling in RAG
			
	Abstract in italiano
	
				This thesis presents the design, implementation, and cybersecurity evaluation of a secure, onpremise Retrieval-Augmented Generation (RAG) conversational agent tailored for a museum
environment. To ensure data sovereignty and operate within a strict 36GB VRAM hardware
constraint, the open-weight Qwen3.5-35B-A3B model was selected. The system optimizes latency by decoupling offline document ingestion from real-time generation, employing a hybrid
search pipeline that combines Maximal Marginal Relevance (MMR) with BM25, refined by a
cross-encoder reranker. This architecture achieves a 72.23% retrieval success rate while maintaining a strict operational latency of 1.4 seconds per query.
Crucially, deploying RAG architectures shifts traditional security defenses to a novel ”semantic perimeter”. This study conducts a rigorous threat modeling assessment, evaluating
inference- and data-phase attack vectors such as micro-scale data poisoning (PoisonedRAG)
, automated jailbreaking (GPTFUZZER) , and Denial of Service exploits. To secure the infrastructure, the research proposes multi-layered countermeasures, including data provenance,
retrieval-native access controls, and a Dual LLM pattern
			
	Parola chiave
	
				Generative AI
Knowledge Retrieval
RAG Architecture
Large Language Model
LLM Security
			
	Relatore
	
				SALMASO, LUIGI
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Caligiuri Giorgio master thesis.pdf Accesso riservato Dimensione 835.31 kB Formato Adobe PDF	835.31 kB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/108077