The rapid growth of digital content across platforms such as social media, news articles, academic publications, and online forums has resulted in an overwhelming volume of unstructured textual data. Extracting meaningful information from this data is critical for numerous applications, including information retrieval, knowledge base population, and automated question-answering systems. Named Entity Recognition (NER) and Relation Extraction (RE) are essential components in this process, enabling the identification of entities and the relationships between them. However, traditional models often fall short in handling the complexities of language, particularly domain-specific terminologies and intricate relational structures. This thesis explores the application of Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art pre-trained language model, for NER and RE tasks. The primary objectives are to evaluate the performance of BERT-based models on domain-specific datasets, compare them with existing state-of-the-art techniques, and develop a framework for efficient training and application of these models across various contexts. Our study involves a comprehensive experimental setup using diverse datasets, including scientific texts, to assess BERT’s ability to handle specialized vocabularies and complex relational data. The methodology includes fine-tuning BERT models for NER and RE, implementing rigorous evaluation metrics, and comparing results with other contemporary models. We focus on reproducibility and robustness, ensuring that our findings are applicable across different domains and data types. The findings reveal that while our BERT-based model may not always exceed the performance of current state-of-the-art models, it performs on par with them. Significantly, it achieves this with a more straightforward design and substantially lower computational overhead. This efficiency makes it an attractive option for practical scenarios where minimizing resource use and operational costs is crucial.

An Experimental Study on Bidirectional Encoder Representations from Transformers (BERT) for Named Entity Recognition and Relation Extraction.

MOHAMMAD, ODAI
2023/2024

Abstract

The rapid growth of digital content across platforms such as social media, news articles, academic publications, and online forums has resulted in an overwhelming volume of unstructured textual data. Extracting meaningful information from this data is critical for numerous applications, including information retrieval, knowledge base population, and automated question-answering systems. Named Entity Recognition (NER) and Relation Extraction (RE) are essential components in this process, enabling the identification of entities and the relationships between them. However, traditional models often fall short in handling the complexities of language, particularly domain-specific terminologies and intricate relational structures. This thesis explores the application of Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art pre-trained language model, for NER and RE tasks. The primary objectives are to evaluate the performance of BERT-based models on domain-specific datasets, compare them with existing state-of-the-art techniques, and develop a framework for efficient training and application of these models across various contexts. Our study involves a comprehensive experimental setup using diverse datasets, including scientific texts, to assess BERT’s ability to handle specialized vocabularies and complex relational data. The methodology includes fine-tuning BERT models for NER and RE, implementing rigorous evaluation metrics, and comparing results with other contemporary models. We focus on reproducibility and robustness, ensuring that our findings are applicable across different domains and data types. The findings reveal that while our BERT-based model may not always exceed the performance of current state-of-the-art models, it performs on par with them. Significantly, it achieves this with a more straightforward design and substantially lower computational overhead. This efficiency makes it an attractive option for practical scenarios where minimizing resource use and operational costs is crucial.
2023
An Experimental Study on Bidirectional Encoder Representations from Transformers (BERT) for Named Entity Recognition and Relation Extraction.
Entity Recognition
Relation Extraction
InformationRetrieval
BERT
Transformers
File in questo prodotto:
File Dimensione Formato  
mohammad_odai.pdf

accesso aperto

Dimensione 510.67 kB
Formato Adobe PDF
510.67 kB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/66610