Engineering a RAG Chatbot for Technical Manual Navigation through Vector Search and Cloud LLM Integration

This thesis presents the design and development of an intelligent chatbot system built on the Retrieval-Augmented Generation (RAG) paradigm, with the primary goal of facilitating the navigation and consultation of technical manuals. The system integrates a vector-based retrieval engine, powered by ChromaDB, with a cloud-hosted large language model (LLM) accessed through the Gemini API. Users can interact with the chatbot in natural language, receiving contextually relevant and technically accurate responses drawn directly from domain-specific PDF manuals. The architecture follows a modular and scalable client-server approach: the backend, implemented in Python with FastAPI, handles document ingestion, semantic chunking, and vector search, while the LLM enriches retrieved passages with generative reasoning to produce coherent and user-friendly answers. On the client side, a Flutter-based application running on a tablet serves as an interactive front-end, ensuring usability in real-world industrial contexts. Persistent memory and efficient retrieval strategies ensure that responses remain accurate even as the document base grows. Beyond implementation, the thesis emphasizes practical concerns such as latency, device limitations, and internet dependency, which are critical when deploying AI systems on constrained hardware in field environments. Experimental evaluations were conducted to measure retrieval accuracy, response quality, and overall user experience. Results demonstrate that the proposed architecture successfully balances performance, scalability, and usability, highlighting the potential of RAG-driven chatbots to support industrial documentation access, training, and decision-making in real-world deployment scenarios.

Questa tesi presenta la progettazione e lo sviluppo di un sistema di chatbot intelligente basato sul paradigma Retrieval-Augmented Generation (RAG), con l’obiettivo principale di facilitare la navigazione e la consultazione di manuali tecnici. Il sistema integra un motore di recupero basato su vettori, gestito da ChromaDB, con un modello linguistico di grandi dimensioni (LLM) ospitato nel cloud e accessibile tramite l’API Gemini. Gli utenti possono interagire con il chatbot in linguaggio naturale, ricevendo risposte contestualmente rilevanti e tecnicamente accurate, tratte direttamente da manuali PDF specifici del dominio. L’architettura segue un approccio modulare e scalabile di tipo client-server: il backend, implementato in Python con FastAPI, gestisce l’ingestione dei documenti, la segmentazione semantica e la ricerca vettoriale, mentre l’LLM arricchisce i passaggi recuperati con un ragionamento generativo per produrre risposte coerenti e di facile comprensione per l’utente. Sul lato client, un’applicazione sviluppata in Flutter e installata su tablet funge da interfaccia interattiva, garantendo l’usabilità in contesti industriali reali. La memoria persistente e le strategie di recupero efficienti assicurano che le risposte rimangano accurate anche con la crescita della base documentale. Oltre all’implementazione, la tesi pone l’accento su aspetti pratici come la latenza, le limitazioni dei dispositivi e la dipendenza da Internet, elementi critici quando si distribuiscono sistemi di intelligenza artificiale su hardware con risorse limitate in ambienti operativi sul campo. Sono state condotte valutazioni sperimentali per misurare l’accuratezza del recupero, la qualità delle risposte e l’esperienza complessiva dell’utente. I risultati dimostrano che l’architettura proposta riesce a bilanciare con successo prestazioni, scalabilità e usabilità, evidenziando il potenziale dei chatbot basati su RAG nel supportare l’accesso alla documentazione industriale, la formazione e il processo decisionale in scenari di implementazione reale.