In recent years, Large Language Models (LLMs) have changed the world of Natural Language Processing (NLP), showing impressive results in many tasks. But one of their main weaknesses is their tendency to produce inaccurate or unsupported information, a problem known as “hallucination.” This becomes especially serious in areas where accuracy really matters, such as fact-checking. This thesis explores how LLMs can be combined with knowledge graphs using the Retrieval-Augmented Generation (RAG) method to make fact-checking more reliable and accurate. The main goal of the research is to understand how integrating structured knowledge sources, like knowledge graphs, can help LLMs better check the truthfulness of the information. The idea is to build a hybrid system that brings together LLMs and the structured precision of knowledge graphs, in order to produce more trustworthy and accurate answers. The proposed method is based on a system that combines Named Entity Recognition (NER) and Named Entity Linking (NEL) with SPARQL queries to the DBpedia knowledge graph. The system was tested using the ExpertQA dataset, and both quantitative and qualitative metrics were used to evaluate its performance. Different approaches were compared to see how effective the integration really was. The experimental results show that adding context from knowledge graphs helps LLMs perform better in fact-checking tasks. Each model, however, benefits in its own way. For example, Gemini-1.5-Flash showed an improvement in balanced accuracy when given structured descriptions like abstracts. This means it became better at handling both true and false claims without leaning too much toward one side. On the other hand, GPT-4o-Mini saw the biggest gains in overall accuracy, especially when summaries were added to the prompt. The qualitative analysis also revealed that each model reacts differently to the type of information it receives. Gemini achieved up to ~60\% balanced accuracy (+3\% from baseline) with abstracts, while GPT-4o-Mini reached a peak accuracy of ~76.8\% (+8.4\%) when using summaries. These results highlight the need for custom prompt engineering strategies, as each model benefits from different types of contextual information. While Gemini-1.5-Flash prioritizes balanced decision-making, GPT-4o-Mini is more effective in maximizing correct predictions, even if this means favoring the majority class. Designing the prompt around each model’s tendencies can significantly enhance overall fact-checking performance. In conclusion, the thesis shows that integrating LLMs with structured sources like DBpedia, using a RAG architecture, can improve fact-checking reliability. The main contribution of the work is the development of a modular system that combines the natural language abilities of LLMs with the accuracy of knowledge graphs, helping AI become a more effective tool against misinformation.

In recent years, Large Language Models (LLMs) have changed the world of Natural Language Processing (NLP), showing impressive results in many tasks. But one of their main weaknesses is their tendency to produce inaccurate or unsupported information, a problem known as “hallucination.” This becomes especially serious in areas where accuracy really matters, such as fact-checking. This thesis explores how LLMs can be combined with knowledge graphs using the Retrieval-Augmented Generation (RAG) method to make fact-checking more reliable and accurate. The main goal of the research is to understand how integrating structured knowledge sources, like knowledge graphs, can help LLMs better check the truthfulness of the information. The idea is to build a hybrid system that brings together LLMs and the structured precision of knowledge graphs, in order to produce more trustworthy and accurate answers. The proposed method is based on a system that combines Named Entity Recognition (NER) and Named Entity Linking (NEL) with SPARQL queries to the DBpedia knowledge graph. The system was tested using the ExpertQA dataset, and both quantitative and qualitative metrics were used to evaluate its performance. Different approaches were compared to see how effective the integration really was. The experimental results show that adding context from knowledge graphs helps LLMs perform better in fact-checking tasks. Each model, however, benefits in its own way. For example, Gemini-1.5-Flash showed an improvement in balanced accuracy when given structured descriptions like abstracts. This means it became better at handling both true and false claims without leaning too much toward one side. On the other hand, GPT-4o-Mini saw the biggest gains in overall accuracy, especially when summaries were added to the prompt. The qualitative analysis also revealed that each model reacts differently to the type of information it receives. Gemini achieved up to ~60\% balanced accuracy (+3\% from baseline) with abstracts, while GPT-4o-Mini reached a peak accuracy of ~76.8\% (+8.4\%) when using summaries. These results highlight the need for custom prompt engineering strategies, as each model benefits from different types of contextual information. While Gemini-1.5-Flash prioritizes balanced decision-making, GPT-4o-Mini is more effective in maximizing correct predictions, even if this means favoring the majority class. Designing the prompt around each model’s tendencies can significantly enhance overall fact-checking performance. In conclusion, the thesis shows that integrating LLMs with structured sources like DBpedia, using a RAG architecture, can improve fact-checking reliability. The main contribution of the work is the development of a modular system that combines the natural language abilities of LLMs with the accuracy of knowledge graphs, helping AI become a more effective tool against misinformation.

Integrating Knowledge Graphs into RAG-Based LLMs to Improve Fact-Checking

VICENTINI, ROBERTO
2024/2025

Abstract

In recent years, Large Language Models (LLMs) have changed the world of Natural Language Processing (NLP), showing impressive results in many tasks. But one of their main weaknesses is their tendency to produce inaccurate or unsupported information, a problem known as “hallucination.” This becomes especially serious in areas where accuracy really matters, such as fact-checking. This thesis explores how LLMs can be combined with knowledge graphs using the Retrieval-Augmented Generation (RAG) method to make fact-checking more reliable and accurate. The main goal of the research is to understand how integrating structured knowledge sources, like knowledge graphs, can help LLMs better check the truthfulness of the information. The idea is to build a hybrid system that brings together LLMs and the structured precision of knowledge graphs, in order to produce more trustworthy and accurate answers. The proposed method is based on a system that combines Named Entity Recognition (NER) and Named Entity Linking (NEL) with SPARQL queries to the DBpedia knowledge graph. The system was tested using the ExpertQA dataset, and both quantitative and qualitative metrics were used to evaluate its performance. Different approaches were compared to see how effective the integration really was. The experimental results show that adding context from knowledge graphs helps LLMs perform better in fact-checking tasks. Each model, however, benefits in its own way. For example, Gemini-1.5-Flash showed an improvement in balanced accuracy when given structured descriptions like abstracts. This means it became better at handling both true and false claims without leaning too much toward one side. On the other hand, GPT-4o-Mini saw the biggest gains in overall accuracy, especially when summaries were added to the prompt. The qualitative analysis also revealed that each model reacts differently to the type of information it receives. Gemini achieved up to ~60\% balanced accuracy (+3\% from baseline) with abstracts, while GPT-4o-Mini reached a peak accuracy of ~76.8\% (+8.4\%) when using summaries. These results highlight the need for custom prompt engineering strategies, as each model benefits from different types of contextual information. While Gemini-1.5-Flash prioritizes balanced decision-making, GPT-4o-Mini is more effective in maximizing correct predictions, even if this means favoring the majority class. Designing the prompt around each model’s tendencies can significantly enhance overall fact-checking performance. In conclusion, the thesis shows that integrating LLMs with structured sources like DBpedia, using a RAG architecture, can improve fact-checking reliability. The main contribution of the work is the development of a modular system that combines the natural language abilities of LLMs with the accuracy of knowledge graphs, helping AI become a more effective tool against misinformation.
2024
Integrating Knowledge Graphs into RAG-Based LLMs to Improve Fact-Checking
In recent years, Large Language Models (LLMs) have changed the world of Natural Language Processing (NLP), showing impressive results in many tasks. But one of their main weaknesses is their tendency to produce inaccurate or unsupported information, a problem known as “hallucination.” This becomes especially serious in areas where accuracy really matters, such as fact-checking. This thesis explores how LLMs can be combined with knowledge graphs using the Retrieval-Augmented Generation (RAG) method to make fact-checking more reliable and accurate. The main goal of the research is to understand how integrating structured knowledge sources, like knowledge graphs, can help LLMs better check the truthfulness of the information. The idea is to build a hybrid system that brings together LLMs and the structured precision of knowledge graphs, in order to produce more trustworthy and accurate answers. The proposed method is based on a system that combines Named Entity Recognition (NER) and Named Entity Linking (NEL) with SPARQL queries to the DBpedia knowledge graph. The system was tested using the ExpertQA dataset, and both quantitative and qualitative metrics were used to evaluate its performance. Different approaches were compared to see how effective the integration really was. The experimental results show that adding context from knowledge graphs helps LLMs perform better in fact-checking tasks. Each model, however, benefits in its own way. For example, Gemini-1.5-Flash showed an improvement in balanced accuracy when given structured descriptions like abstracts. This means it became better at handling both true and false claims without leaning too much toward one side. On the other hand, GPT-4o-Mini saw the biggest gains in overall accuracy, especially when summaries were added to the prompt. The qualitative analysis also revealed that each model reacts differently to the type of information it receives. Gemini achieved up to ~60\% balanced accuracy (+3\% from baseline) with abstracts, while GPT-4o-Mini reached a peak accuracy of ~76.8\% (+8.4\%) when using summaries. These results highlight the need for custom prompt engineering strategies, as each model benefits from different types of contextual information. While Gemini-1.5-Flash prioritizes balanced decision-making, GPT-4o-Mini is more effective in maximizing correct predictions, even if this means favoring the majority class. Designing the prompt around each model’s tendencies can significantly enhance overall fact-checking performance. In conclusion, the thesis shows that integrating LLMs with structured sources like DBpedia, using a RAG architecture, can improve fact-checking reliability. The main contribution of the work is the development of a modular system that combines the natural language abilities of LLMs with the accuracy of knowledge graphs, helping AI become a more effective tool against misinformation.
Large Language Model
RAG
Knowledge Graph
Fact Checking
File in questo prodotto:
File Dimensione Formato  
Roberto_Vicentini_master_thesis.pdf

Accesso riservato

Dimensione 982.09 kB
Formato Adobe PDF
982.09 kB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/84793