Large Language Models as Knowledge Graph Accuracy Estimators: An Exploratory Analysis

Knowledge Graphs (KGs) are gaining popularity across many real-world applications where accuracy is crucial. The current standard for accuracy evaluation is human-based manual annotation, which is costly due to the large size of the KGs. Emerging studies investigated how KGs' accuracy can be estimated efficiently by employing sampling approaches, aiming to provide a reliable assessment with reduced costs. Yet, to a lesser extent, manual annotation is still required. In this thesis, to further reduce (or even eliminate) the manual costs involved, we explore the use of Large Language Models (LLMs) as automated annotators. To this end, we apply three prompting techniques over multiple KGs to investigate what are the advantages and limitations of LLMs when used to evaluate the accuracy of a KG. The techniques are: Baseline, featuring a simple and straightforward prompt, used as a reference point; Iterative, adding detailed instructions and examples to the prompt; and Stepwise, which approaches the annotation process from a different perspective, employing natural language statements instead of raw KG data. Experiments on popular KGs show that the Iterative prompting demonstrates enhanced reliability, especially in the few-shot scenario, while the Stepwise prompting improves only with respect to the Baseline. However, its high costs make this latter technique unsuitable for the task at hand. In summary, this study highlights the potential of LLMs in automating KG Accuracy Evaluation processes, and lays out a path for further exploration in retrieval-augmented pipelines and fine-tuned models.

I Knowledge Graph stanno acquisendo popolarità in numerose applicazioni reali, dove l'accuratezza è cruciale. Lo standard attuale per la valutazione dell'accuratezza è l'annotazione manuale basata su esseri umani, che risulta costosa a causa delle grandi dimensioni dei KG. Studi recenti hanno investigato come l'accuratezza dei KG possa essere stimata in modo efficiente usando approcci di campionamento, con l'obiettivo di fornire una stima affidabile a costi ridotti. Tuttavia, in misura minore, l'annotazione manuale rimane ancora necessaria. In questa tesi, per ridurre ulteriormente (o addirittura eliminare) i costi manuali coinvolti, esploriamo l'uso dei Large Language Models come annotatori automatizzati. A tal fine, applichiamo tre tecniche di prompting su diversi KG per investigare quali siano i vantaggi e le limitazioni dei LLM quando utilizzati per valutare l'accuratezza di un KG. Le tecniche sono: Baseline, caratterizzata da un prompt semplice e diretto, usata come punto di riferimento; Iterative, che aggiunge istruzioni dettagliate ed esempi al prompt; e Stepwise, che affronta il processo di annotazione da una prospettiva diversa, usando asserzioni in linguaggio naturale al posto dei dati grezzi del KG. Gli esperimenti condotti su KG popolari mostrano che la tecnica Iterative dimostra una maggiore affidabilità, soprattutto nello scenario few-shot, mentre la tecnica Stepwise migliora solo rispetto alla Baseline. Tuttavia, i suoi alti costi rendono quest'ultima tecnica inadatta per il task in questione. In sintesi, questo studio evidenzia il potenziale dei LLM nell'automazione dei processi di valutazione dell'accuratezza dei KG, e traccia un percorso per ulteriori esplorazioni in pipeline arricchite tramite recupero delle informazioni e modelli fine-tuned.