Generazione aumentata di Knowledge Graph Enhanced Retrieval per i LLM

Retrieval Augmented Generation (RAG) systems enhance Large Language Models (LLMs) with external knowledge but often struggle with complex reasoning and synthesizing information scattered across unstructured documents. Knowledge Graphs (KGs) offer structured representations that facilitate logical inference, yet their automated construction, especially from noisy sources like corporate documentation, remains a significant hurdle. This thesis addresses these limitations by proposing a hybrid Knowledge Graph-Retrieval Augmented Generation (KG-RAG) architecture. In this approach, LLMs are leveraged not only for final answer generation but also for the intermediate step of KG construction and enrichment directly from text. Advanced prompt engineering techniques are employed to capture complex semantic relationships and infer implicit knowledge. The core of this work is kgrag, a custom Python library developed to integrate these processes. This library incorporates algorithms for entity resolution and community detection to refine the KG structure and support query execution. The proposed KG-RAG system, powered by kgrag and integrated with LlamaIndex, was evaluated against a traditional RAG baseline using a curated Question-Answering dataset derived from corporate documents. Performance was measured using Accuracy, Precision, Recall, and F1-Score. This research highlights the synergistic potential of combining LLMs and KGs, offering a pathway to more robust, context-aware, and inferentially capable information retrieval systems for complex textual data.

I sistemi di Retrieval Augmented Generation (RAG) potenziano i Large Language Models (LLMs) con conoscenze esterne, ma spesso faticano con il ragionamento complesso e la sintesi di informazioni disperse in documenti non strutturati. I Knowledge Graphs (KGs) offrono rappresentazioni strutturate che facilitano l'inferenza logica, ma la loro costruzione automatica, specialmente da fonti rumorose come la documentazione aziendale, rimane una sfida significativa. Questa tesi affronta queste limitazioni proponendo un'architettura ibrida Knowledge Graph-Retrieval Augmented Generation (KG-RAG). In questo approccio, gli LLMs vengono utilizzati non solo per la generazione finale delle risposte, ma anche per la fase intermedia di costruzione e arricchimento dei KG direttamente dal testo. Vengono impiegate tecniche avanzate di prompt engineering per catturare relazioni semantiche complesse e inferire conoscenze implicite. Il nucleo di questo lavoro è kgrag, una libreria Python personalizzata sviluppata per integrare questi processi. Questa libreria incorpora algoritmi per la risoluzione delle entità e il rilevamento delle comunità per raffinare la struttura del KG e supportare l'esecuzione delle query. Il sistema KG-RAG proposto, alimentato da kgrag e integrato con LlamaIndex, è stato valutato rispetto a un RAG tradizionale utilizzando un dataset di domande e risposte derivato da documenti aziendali. Le prestazioni sono state misurate utilizzando Accuracy, Precision, Recall e F1-Score. Questa ricerca evidenzia il potenziale sinergico della combinazione di LLMs e KGs, offrendo un percorso verso sistemi di recupero delle informazioni più robusti, consapevoli del contesto e capaci di inferenza per dati testuali complessi.