Switchgear maintenance is a complex task that requires expert knowledge to address a wide range of potential issues. Although technical reports can assist, they are often disorganized and difficult to navigate. Knowledge graphs offer a structured solution, but constructing them from extensive text sources is challenging. Previous research demonstrated that Large Language Models can effectively convert unstructured data into knowledge graphs; however, their performance has been limited to short texts, struggling with larger documents. This work aimed to address that gap by developing a two-step extraction pipeline consisting of Entity Extraction and Relationship Extraction, applied in a few-shot learning context. Preprocessing techniques were used to improve data quality, and entity alignment methods were applied to reduce graph sparsity. The results show that with proper tuning, high-quality knowledge graphs can be generated from lengthy technical reports. These findings provide a pathway for more efficient use of unstructured knowledge in multiple industrial domains, potentially reducing reliance on manual expertise and improving the organization of large-scale knowledge bases.

Switchgear maintenance is a complex task that requires expert knowledge to address a wide range of potential issues. Although technical reports can assist, they are often disorganized and difficult to navigate. Knowledge graphs offer a structured solution, but constructing them from extensive text sources is challenging. Previous research demonstrated that Large Language Models can effectively convert unstructured data into knowledge graphs; however, their performance has been limited to short texts, struggling with larger documents. This work aimed to address that gap by developing a two-step extraction pipeline consisting of entity extraction and relationship extraction, applied in a few-shot learning context. Preprocessing techniques were used to improve data quality, and entity alignment methods were applied to reduce graph sparsity. The results show that with proper tuning, high-quality knowledge graphs can be generated from lengthy technical reports. These findings provide a pathway for more efficient use of unstructured knowledge in multiple industrial domains, potentially reducing reliance on manual expertise and improving the organization of large-scale knowledge bases.

Enhancing Knowledge Graph Construction from Multilingual Technical Reports using Large Language Models

FORMAGGIO, ALBERTO
2023/2024

Abstract

Switchgear maintenance is a complex task that requires expert knowledge to address a wide range of potential issues. Although technical reports can assist, they are often disorganized and difficult to navigate. Knowledge graphs offer a structured solution, but constructing them from extensive text sources is challenging. Previous research demonstrated that Large Language Models can effectively convert unstructured data into knowledge graphs; however, their performance has been limited to short texts, struggling with larger documents. This work aimed to address that gap by developing a two-step extraction pipeline consisting of Entity Extraction and Relationship Extraction, applied in a few-shot learning context. Preprocessing techniques were used to improve data quality, and entity alignment methods were applied to reduce graph sparsity. The results show that with proper tuning, high-quality knowledge graphs can be generated from lengthy technical reports. These findings provide a pathway for more efficient use of unstructured knowledge in multiple industrial domains, potentially reducing reliance on manual expertise and improving the organization of large-scale knowledge bases.
2023
Enhancing Knowledge Graph Construction from Multilingual Technical Reports using Large Language Models
Switchgear maintenance is a complex task that requires expert knowledge to address a wide range of potential issues. Although technical reports can assist, they are often disorganized and difficult to navigate. Knowledge graphs offer a structured solution, but constructing them from extensive text sources is challenging. Previous research demonstrated that Large Language Models can effectively convert unstructured data into knowledge graphs; however, their performance has been limited to short texts, struggling with larger documents. This work aimed to address that gap by developing a two-step extraction pipeline consisting of entity extraction and relationship extraction, applied in a few-shot learning context. Preprocessing techniques were used to improve data quality, and entity alignment methods were applied to reduce graph sparsity. The results show that with proper tuning, high-quality knowledge graphs can be generated from lengthy technical reports. These findings provide a pathway for more efficient use of unstructured knowledge in multiple industrial domains, potentially reducing reliance on manual expertise and improving the organization of large-scale knowledge bases.
Large Language Model
Knowledge Graph
Multilingual data
File in questo prodotto:
File Dimensione Formato  
Formaggio_Alberto.pdf

accesso riservato

Dimensione 6.01 MB
Formato Adobe PDF
6.01 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/77848