Development and Deployment of a Generative ChatBot on AWS: Architectural Design and Performance Analysis

This thesis presents the design and evaluation of an internal conversational assistant based on the Retrieval-Augmented Generation (RAG) paradigm. The system is developed for a data and analytics team within a large enterprise, with the goal of providing fast, reliable access to technical and operational documentation through naturallanguage interaction. The assistant combines a REST API backend implemented in Python with a cloud-hosted large language model and an internal knowledge base exported from existing documentation platforms and stored as Markdown files. The architecture follows a modular and serverless approach. A cloud LLM, accessed via a managed API, is used both in a pure “zero-shot” configuration and in RAG settings where it is grounded on semantically retrieved document chunks. A vector-based retrieval layer, built on top of document embeddings, enables semantic search over the internal corpus. Chat sessions and messages are persisted in a NoSQL datastore with time-to-live (TTL) policies, exposing dedicated endpoints for conversation management and administration. From a methodological perspective, the work defines and compares multiple chat modes, including zero-shot prompting, retrieval-only RAG, and few-shot prompting combined with RAG. Functional tests and illustrative dialogues are used to analyse the behaviour of each mode, while quantitative measurements focus on latency and retrieval statistics. Where available, human feedback is used to assess perceived usefulness, relevance, and clarity of the assistant’s answers. The results show that grounding the LLM on an internal knowledge base significantly improves the factuality and usefulness of the responses compared to a zero-shot configuration, especially for questions about internal tools, processes, and runbooks. At the same time, few-shot prompting provides more consistent style and structure in the answers, at the cost of slightly higher response times. The thesis concludes by discussing the limitations of the current prototype. These include the dependency on cloud connectivity, coverage gaps in the knowledge base, and a limited amount of human evaluation. The work then outlines directions for future development, such as richer logging, systematic user studies, and tighter integration with existing enterprise tooling.

Development and Deployment of a Generative ChatBot on AWS: Architectural Design and Performance Analysis

BERBERI, JOI

2024/2025

Abstract

This thesis presents the design and evaluation of an internal conversational assistant based on the Retrieval-Augmented Generation (RAG) paradigm. The system is developed for a data and analytics team within a large enterprise, with the goal of providing fast, reliable access to technical and operational documentation through naturallanguage interaction. The assistant combines a REST API backend implemented in Python with a cloud-hosted large language model and an internal knowledge base exported from existing documentation platforms and stored as Markdown files. The architecture follows a modular and serverless approach. A cloud LLM, accessed via a managed API, is used both in a pure “zero-shot” configuration and in RAG settings where it is grounded on semantically retrieved document chunks. A vector-based retrieval layer, built on top of document embeddings, enables semantic search over the internal corpus. Chat sessions and messages are persisted in a NoSQL datastore with time-to-live (TTL) policies, exposing dedicated endpoints for conversation management and administration. From a methodological perspective, the work defines and compares multiple chat modes, including zero-shot prompting, retrieval-only RAG, and few-shot prompting combined with RAG. Functional tests and illustrative dialogues are used to analyse the behaviour of each mode, while quantitative measurements focus on latency and retrieval statistics. Where available, human feedback is used to assess perceived usefulness, relevance, and clarity of the assistant’s answers. The results show that grounding the LLM on an internal knowledge base significantly improves the factuality and usefulness of the responses compared to a zero-shot configuration, especially for questions about internal tools, processes, and runbooks. At the same time, few-shot prompting provides more consistent style and structure in the answers, at the cost of slightly higher response times. The thesis concludes by discussing the limitations of the current prototype. These include the dependency on cloud connectivity, coverage gaps in the knowledge base, and a limited amount of human evaluation. The work then outlines directions for future development, such as richer logging, systematic user studies, and tighter integration with existing enterprise tooling.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				DATA SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Development and Deployment of a Generative ChatBot on AWS: Architectural Design and Performance Analysis
			
	Abstract in italiano
	
				This thesis presents the design and evaluation of an internal conversational assistant based on the Retrieval-Augmented Generation (RAG) paradigm. The system is developed for a data and analytics team within a large enterprise, with the goal of providing fast, reliable access to technical and operational documentation through naturallanguage interaction. The assistant combines a REST API backend implemented in Python with a cloud-hosted large language model and an internal knowledge base exported from existing documentation platforms and stored as Markdown files.

The architecture follows a modular and serverless approach. A cloud LLM, accessed
via a managed API, is used both in a pure “zero-shot” configuration and in RAG settings where it is grounded on semantically retrieved document chunks. A vector-based
retrieval layer, built on top of document embeddings, enables semantic search over
the internal corpus. Chat sessions and messages are persisted in a NoSQL datastore
with time-to-live (TTL) policies, exposing dedicated endpoints for conversation management and administration.

From a methodological perspective, the work defines and compares multiple chat
modes, including zero-shot prompting, retrieval-only RAG, and few-shot prompting
combined with RAG. Functional tests and illustrative dialogues are used to analyse
the behaviour of each mode, while quantitative measurements focus on latency and
retrieval statistics. Where available, human feedback is used to assess perceived usefulness, relevance, and clarity of the assistant’s answers.
The results show that grounding the LLM on an internal knowledge base significantly improves the factuality and usefulness of the responses compared to a zero-shot
configuration, especially for questions about internal tools, processes, and runbooks.
At the same time, few-shot prompting provides more consistent style and structure
in the answers, at the cost of slightly higher response times. The thesis concludes by
discussing the limitations of the current prototype. These include the dependency on
cloud connectivity, coverage gaps in the knowledge base, and a limited amount of human evaluation. The work then outlines directions for future development, such as
richer logging, systematic user studies, and tighter integration with existing enterprise
tooling.
			
	Parola chiave
	
				ChatBot
Large Language Model
Amazon Web Service
			
	Relatore
	
				ERSEGHE, TOMASO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Maste_Thesis_JoiBerberi.pdf Accesso riservato Dimensione 967.95 kB Formato Adobe PDF	967.95 kB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/102098