FactCheck: Knowledge Graph Fact Verification Through Retrieval-Augmented Generation Using a Multi-Model Ensemble Approach

In today’s world of Artificial intelligence (AI) and big data, knowledge graphs (KGs) play an important role in powering many AI systems, search engines, and decision-support systems. Small errors can propagate through connected systems and cause big problems, so ensuring their accuracy is a critical task. This thesis addresses this challenge by introducing FactCheck, a fact-checking system for KGs. Our method uses Retrieval-Augmented Generation (RAG) coupled with multiple language models to verify facts. FactCheck works by generating questions about each KG fact, retrieving relevant documents, splitting them into chunks, and then feeding chunks as input to the large language models (LLMs). Then the majority vote system with dispute resolution decides on the fact's correctness by considering the generated responses. We tested our approach on three real-life datasets—FactBench, YAGO, and DBpedia—whereby comparing the FactCheck output with gold standard labels, we achieved prediction performance rates of 90, 87, and 70 percent, respectively. On average, verifying a single fact requires processing about 1,550 tokens per LLM, and takes about 7 minutes to reach a final decision. These metrics demonstrate the system's resource usage and performance. To achieve these results, we tuned different components of the RAG pipeline by selecting the best parameters/models for document selection, embedding, and chunking through systematic testing. The system offers a reliable and scalable solution that is compatible with various KG environments and can be adapted to handle different types of facts.

FactCheck: Knowledge Graph Fact Verification Through Retrieval-Augmented Generation Using a Multi-Model Ensemble Approach

SHAMI, FARZAD

2024/2025

Abstract

In today’s world of Artificial intelligence (AI) and big data, knowledge graphs (KGs) play an important role in powering many AI systems, search engines, and decision-support systems. Small errors can propagate through connected systems and cause big problems, so ensuring their accuracy is a critical task. This thesis addresses this challenge by introducing FactCheck, a fact-checking system for KGs. Our method uses Retrieval-Augmented Generation (RAG) coupled with multiple language models to verify facts. FactCheck works by generating questions about each KG fact, retrieving relevant documents, splitting them into chunks, and then feeding chunks as input to the large language models (LLMs). Then the majority vote system with dispute resolution decides on the fact's correctness by considering the generated responses. We tested our approach on three real-life datasets—FactBench, YAGO, and DBpedia—whereby comparing the FactCheck output with gold standard labels, we achieved prediction performance rates of 90, 87, and 70 percent, respectively. On average, verifying a single fact requires processing about 1,550 tokens per LLM, and takes about 7 minutes to reach a final decision. These metrics demonstrate the system's resource usage and performance. To achieve these results, we tuned different components of the RAG pipeline by selecting the best parameters/models for document selection, embedding, and chunking through systematic testing. The system offers a reliable and scalable solution that is compatible with various KG environments and can be adapted to handle different types of facts.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				FactCheck: Knowledge Graph Fact Verification Through Retrieval-Augmented Generation Using a Multi-Model Ensemble Approach
			
	Parola chiave
	
				Knowledge Graph
Automated Fact-Check
RAG
LLM
IR
			
	Relatore
	
				MARCHESIN, STEFANO
			
	Correlatore
	
				SILVELLO, GIANMARIA
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
main.pdf accesso aperto Dimensione 15.54 MB Formato Adobe PDF Visualizza/Apri	15.54 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/83740