Enhancing Knowledge Graph Construction from Multilingual Technical Reports using Large Language Models

Switchgear maintenance is a complex task that requires expert knowledge to address a wide range of potential issues. Although technical reports can assist, they are often disorganized and difficult to navigate. Knowledge graphs offer a structured solution, but constructing them from extensive text sources is challenging. Previous research demonstrated that Large Language Models can effectively convert unstructured data into knowledge graphs; however, their performance has been limited to short texts, struggling with larger documents. This work aimed to address that gap by developing a two-step extraction pipeline consisting of Entity Extraction and Relationship Extraction, applied in a few-shot learning context. Preprocessing techniques were used to improve data quality, and entity alignment methods were applied to reduce graph sparsity. The results show that with proper tuning, high-quality knowledge graphs can be generated from lengthy technical reports. These findings provide a pathway for more efficient use of unstructured knowledge in multiple industrial domains, potentially reducing reliance on manual expertise and improving the organization of large-scale knowledge bases.

Switchgear maintenance is a complex task that requires expert knowledge to address a wide range of potential issues. Although technical reports can assist, they are often disorganized and difficult to navigate. Knowledge graphs offer a structured solution, but constructing them from extensive text sources is challenging. Previous research demonstrated that Large Language Models can effectively convert unstructured data into knowledge graphs; however, their performance has been limited to short texts, struggling with larger documents. This work aimed to address that gap by developing a two-step extraction pipeline consisting of entity extraction and relationship extraction, applied in a few-shot learning context. Preprocessing techniques were used to improve data quality, and entity alignment methods were applied to reduce graph sparsity. The results show that with proper tuning, high-quality knowledge graphs can be generated from lengthy technical reports. These findings provide a pathway for more efficient use of unstructured knowledge in multiple industrial domains, potentially reducing reliance on manual expertise and improving the organization of large-scale knowledge bases.

Enhancing Knowledge Graph Construction from Multilingual Technical Reports using Large Language Models

FORMAGGIO, ALBERTO

2023/2024

Abstract

Switchgear maintenance is a complex task that requires expert knowledge to address a wide range of potential issues. Although technical reports can assist, they are often disorganized and difficult to navigate. Knowledge graphs offer a structured solution, but constructing them from extensive text sources is challenging. Previous research demonstrated that Large Language Models can effectively convert unstructured data into knowledge graphs; however, their performance has been limited to short texts, struggling with larger documents. This work aimed to address that gap by developing a two-step extraction pipeline consisting of Entity Extraction and Relationship Extraction, applied in a few-shot learning context. Preprocessing techniques were used to improve data quality, and entity alignment methods were applied to reduce graph sparsity. The results show that with proper tuning, high-quality knowledge graphs can be generated from lengthy technical reports. These findings provide a pathway for more efficient use of unstructured knowledge in multiple industrial domains, potentially reducing reliance on manual expertise and improving the organization of large-scale knowledge bases.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Enhancing Knowledge Graph Construction from Multilingual Technical Reports using Large Language Models
			
	Abstract in italiano
	
				Switchgear maintenance is a complex task that requires expert knowledge to address a wide range of potential issues. Although technical reports can assist, they are often disorganized and difficult to navigate. Knowledge graphs offer a structured solution, but constructing them from extensive text sources is challenging. Previous research demonstrated that Large Language Models can effectively convert unstructured data into knowledge graphs; however, their performance has been limited to short texts, struggling with larger documents. This work aimed to address that gap by developing a two-step extraction pipeline consisting of entity extraction and relationship extraction, applied in a few-shot learning context. Preprocessing techniques were used to improve data quality, and entity alignment methods were applied to reduce graph sparsity. The results show that with proper tuning, high-quality knowledge graphs can be generated from lengthy technical reports. These findings provide a pathway for more efficient use of unstructured knowledge in multiple industrial domains, potentially reducing reliance on manual expertise and improving the organization of large-scale knowledge bases.
			
	Parola chiave
	
				Large Language Model
Knowledge Graph
Multilingual data
			
	Relatore
	
				SATTA, GIORGIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Formaggio_Alberto.pdf Accesso riservato Dimensione 6.01 MB Formato Adobe PDF	6.01 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/77848