Large Language Models (LLMs), built upon the transformer architecture and capable of few-shot learning, are nevertheless constrained by their static, pre-trained knowledge and a propensity for hallucination. Retrieval-Augmented Generation (RAG) addresses these issues by grounding generation in external data. However, standard ``Naive RAG'' frameworks, often relying on dense passage retrieval, sequence-to-sequence denoising, and vector stores like Chroma, fail when handling complex, relational data, as they treat knowledge as an unstructured ``bag of chunks.'' To resolve this, a new paradigm reviewed by, Graph RAG, has emerged, transforming source corpora into structured knowledge graphs. While advanced frameworks like ``From Local to Global'' (GraphRAG), StructRAG, and LightRAG claim superiority over retrieval-less approaches like HOLMES or graph optimization methods like REANO, they have been developed in isolation. This creates a critical research gap, leaving no clear, comparative data for practitioners. This thesis bridges this gap by presenting a rigorous benchmark analysis of these leading Graph RAG frameworks using the Gemini 2.0 and 2.5 flash-lite models as the underlying reasoning engines. They are evaluated against Naive RAG and a Simple LLM baseline across two distinct datasets: UltraDomainCS and NarrativeQA. Performance is measured using a broad range of metrics, including quality via LLM-as-a-Judge and BERTScore, cost, latency, and robustness. Our findings quantify the limitations of unstructured retrieval and the specific trade-offs of graph-based approaches. On technical data, Naive RAG proved detrimental (43.82\%), inferior to the Simple LLM baseline (49.20\%). In contrast, LightRAG emerged as the optimal solution (92.75\%) while consuming approximately 8x fewer tokens than frameworks like StructRAG. Conversely, on narrative data, StructRAG was the only robust performer, uniquely capable of bridging long-distance relationships, though at high cost. Finally, while GraphRAG provided high-quality answers, its latency rendered it non-viable for real-time applications. This analysis establishes the first data-driven trade-off for Graph RAG, demonstrating that the choice of framework must be optimized for the specific density and cost constraints of the domain.
Large Language Models (LLMs), built upon the transformer architecture and capable of few-shot learning, are nevertheless constrained by their static, pre-trained knowledge and a propensity for hallucination. Retrieval-Augmented Generation (RAG) addresses these issues by grounding generation in external data. However, standard ``Naive RAG'' frameworks, often relying on dense passage retrieval, sequence-to-sequence denoising, and vector stores like Chroma, fail when handling complex, relational data, as they treat knowledge as an unstructured ``bag of chunks.'' To resolve this, a new paradigm reviewed by, Graph RAG, has emerged, transforming source corpora into structured knowledge graphs. While advanced frameworks like ``From Local to Global'' (GraphRAG), StructRAG, and LightRAG claim superiority over retrieval-less approaches like HOLMES or graph optimization methods like REANO, they have been developed in isolation. This creates a critical research gap, leaving no clear, comparative data for practitioners. This thesis bridges this gap by presenting a rigorous benchmark analysis of these leading Graph RAG frameworks using the Gemini 2.0 and 2.5 flash-lite models as the underlying reasoning engines. They are evaluated against Naive RAG and a Simple LLM baseline across two distinct datasets: UltraDomainCS and NarrativeQA. Performance is measured using a broad range of metrics, including quality via LLM-as-a-Judge and BERTScore, cost, latency, and robustness. Our findings quantify the limitations of unstructured retrieval and the specific trade-offs of graph-based approaches. On technical data, Naive RAG proved detrimental (43.82\%), inferior to the Simple LLM baseline (49.20\%). In contrast, LightRAG emerged as the optimal solution (92.75\%) while consuming approximately 8x fewer tokens than frameworks like StructRAG. Conversely, on narrative data, StructRAG was the only robust performer, uniquely capable of bridging long-distance relationships, though at high cost. Finally, while GraphRAG provided high-quality answers, its latency rendered it non-viable for real-time applications. This analysis establishes the first data-driven trade-off for Graph RAG, demonstrating that the choice of framework must be optimized for the specific density and cost constraints of the domain.
A review on Graph RAG methods
AKBARI, NILA
2024/2025
Abstract
Large Language Models (LLMs), built upon the transformer architecture and capable of few-shot learning, are nevertheless constrained by their static, pre-trained knowledge and a propensity for hallucination. Retrieval-Augmented Generation (RAG) addresses these issues by grounding generation in external data. However, standard ``Naive RAG'' frameworks, often relying on dense passage retrieval, sequence-to-sequence denoising, and vector stores like Chroma, fail when handling complex, relational data, as they treat knowledge as an unstructured ``bag of chunks.'' To resolve this, a new paradigm reviewed by, Graph RAG, has emerged, transforming source corpora into structured knowledge graphs. While advanced frameworks like ``From Local to Global'' (GraphRAG), StructRAG, and LightRAG claim superiority over retrieval-less approaches like HOLMES or graph optimization methods like REANO, they have been developed in isolation. This creates a critical research gap, leaving no clear, comparative data for practitioners. This thesis bridges this gap by presenting a rigorous benchmark analysis of these leading Graph RAG frameworks using the Gemini 2.0 and 2.5 flash-lite models as the underlying reasoning engines. They are evaluated against Naive RAG and a Simple LLM baseline across two distinct datasets: UltraDomainCS and NarrativeQA. Performance is measured using a broad range of metrics, including quality via LLM-as-a-Judge and BERTScore, cost, latency, and robustness. Our findings quantify the limitations of unstructured retrieval and the specific trade-offs of graph-based approaches. On technical data, Naive RAG proved detrimental (43.82\%), inferior to the Simple LLM baseline (49.20\%). In contrast, LightRAG emerged as the optimal solution (92.75\%) while consuming approximately 8x fewer tokens than frameworks like StructRAG. Conversely, on narrative data, StructRAG was the only robust performer, uniquely capable of bridging long-distance relationships, though at high cost. Finally, while GraphRAG provided high-quality answers, its latency rendered it non-viable for real-time applications. This analysis establishes the first data-driven trade-off for Graph RAG, demonstrating that the choice of framework must be optimized for the specific density and cost constraints of the domain.| File | Dimensione | Formato | |
|---|---|---|---|
|
Thesis_NilaAkbari.pdf
accesso aperto
Dimensione
1.93 MB
Formato
Adobe PDF
|
1.93 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/102077