Named Entity Recognition (NER) is a Natural Language Processing (NLP) task that involves detecting and categorizing named entities in a text. Named entities can be names of people, organizations, locations and dates or can be specifically defined for the domain in which the NER task is adopted. NER proves helpful in a variety of applications, the most notable of which is Information Extraction, where NER allows the extraction of structured information from unstructured text. Plenty of approaches have been exploited, ranging from rule-based methods to machine-learning algorithms such as Conditional Random Fields and Hidden Markov Models, to deep-learning systems based on Recurrent Neural Networks and Transformer-based architectures. Lately, with the advent of Large Language Models (LLMs) such as GPT and in-context learning techniques, a new approach to NER has emerged: extracting named entities by posing questions and having the LLM fill in the answer with the named entities (one or more) to extract. These LLMs, employed as Generative Question Answering (GQA) models, still require careful prompt-engineering, both for dealing with inputs not fitting into the context-window, and for producing well-formatted output. If a traditional NER system detects several text spans associated with a named entity, the generative QA model should be prompted to extract the same number of text spans. Furthermore, if a named entity does not appear, the generative LLM used as QA should not produce non-existent content. Based on the above considerations, NER approached as a QA task might be a viable option, although an Extractive QA method may be more appropriate over a generative one. This thesis delves into the Extractive QA approach to NER, providing a comprehensive exploration of its core concepts and methodologies. In particular, this work will first present the operating principle of single-span EQA models, which can only extract a single span of text for each question, before moving on to the design and development of a new transformer-based model that enables Multi-Span Extractive QA, implemented according to specific guidelines so that it is a natural extension of the single-span operating principle. The proposed model is then evaluated on a NER dataset named BUSTER, where specific named entities from business transaction documents have to be extracted. Experiments and comparisons with other transformer-based NER systems, which constitute the baselines for the classical NER approach, allow to show the equal power of this Multi-span EQA approach to NER. Further investigative experiments reveal that the QA approach to NER through the proposed model implementation achieves better results in reduced training set scenarios. Future work will focus on trying to develop a technique to make the model perform Continual Learning on a sequence of NER datasets while retaining its capability to correctly respond to questions from all previously encountered datasets.
Named Entity Recognition (NER) is a Natural Language Processing (NLP) task that involves detecting and categorizing named entities in a text. Named entities can be names of people, organizations, locations and dates or can be specifically defined for the domain in which the NER task is adopted. NER proves helpful in a variety of applications, the most notable of which is Information Extraction, where NER allows the extraction of structured information from unstructured text. Plenty of approaches have been exploited, ranging from rule-based methods to machine-learning algorithms such as Conditional Random Fields and Hidden Markov Models, to deep-learning systems based on Recurrent Neural Networks and Transformer-based architectures. Lately, with the advent of Large Language Models (LLMs) such as GPT and in-context learning techniques, a new approach to NER has emerged: extracting named entities by posing questions and having the LLM fill in the answer with the named entities (one or more) to extract. These LLMs, employed as Generative Question Answering (GQA) models, still require careful prompt-engineering, both for dealing with inputs not fitting into the context-window, and for producing well-formatted output. If a traditional NER system detects several text spans associated with a named entity, the generative QA model should be prompted to extract the same number of text spans. Furthermore, if a named entity does not appear, the generative LLM used as QA should not produce non-existent content. Based on the above considerations, NER approached as a QA task might be a viable option, although an Extractive QA method may be more appropriate over a generative one. This thesis delves into the Extractive QA approach to NER, providing a comprehensive exploration of its core concepts and methodologies. In particular, this work will first present the operating principle of single-span EQA models, which can only extract a single span of text for each question, before moving on to the design and development of a new transformer-based model that enables Multi-Span Extractive QA, implemented according to specific guidelines so that it is a natural extension of the single-span operating principle. The proposed model is then evaluated on a NER dataset named BUSTER, where specific named entities from business transaction documents have to be extracted. Experiments and comparisons with other transformer-based NER systems, which constitute the baselines for the classical NER approach, allow to show the equal power of this Multi-span EQA approach to NER. Further investigative experiments reveal that the QA approach to NER through the proposed model implementation achieves better results in reduced training set scenarios. Future work will focus on trying to develop a technique to make the model perform Continual Learning on a sequence of NER datasets while retaining its capability to correctly respond to questions from all previously encountered datasets.
Multi-Span Extractive Question Answering for Named Entity Recognition
ZAMAI, ANDREW
2022/2023
Abstract
Named Entity Recognition (NER) is a Natural Language Processing (NLP) task that involves detecting and categorizing named entities in a text. Named entities can be names of people, organizations, locations and dates or can be specifically defined for the domain in which the NER task is adopted. NER proves helpful in a variety of applications, the most notable of which is Information Extraction, where NER allows the extraction of structured information from unstructured text. Plenty of approaches have been exploited, ranging from rule-based methods to machine-learning algorithms such as Conditional Random Fields and Hidden Markov Models, to deep-learning systems based on Recurrent Neural Networks and Transformer-based architectures. Lately, with the advent of Large Language Models (LLMs) such as GPT and in-context learning techniques, a new approach to NER has emerged: extracting named entities by posing questions and having the LLM fill in the answer with the named entities (one or more) to extract. These LLMs, employed as Generative Question Answering (GQA) models, still require careful prompt-engineering, both for dealing with inputs not fitting into the context-window, and for producing well-formatted output. If a traditional NER system detects several text spans associated with a named entity, the generative QA model should be prompted to extract the same number of text spans. Furthermore, if a named entity does not appear, the generative LLM used as QA should not produce non-existent content. Based on the above considerations, NER approached as a QA task might be a viable option, although an Extractive QA method may be more appropriate over a generative one. This thesis delves into the Extractive QA approach to NER, providing a comprehensive exploration of its core concepts and methodologies. In particular, this work will first present the operating principle of single-span EQA models, which can only extract a single span of text for each question, before moving on to the design and development of a new transformer-based model that enables Multi-Span Extractive QA, implemented according to specific guidelines so that it is a natural extension of the single-span operating principle. The proposed model is then evaluated on a NER dataset named BUSTER, where specific named entities from business transaction documents have to be extracted. Experiments and comparisons with other transformer-based NER systems, which constitute the baselines for the classical NER approach, allow to show the equal power of this Multi-span EQA approach to NER. Further investigative experiments reveal that the QA approach to NER through the proposed model implementation achieves better results in reduced training set scenarios. Future work will focus on trying to develop a technique to make the model perform Continual Learning on a sequence of NER datasets while retaining its capability to correctly respond to questions from all previously encountered datasets.File | Dimensione | Formato | |
---|---|---|---|
Zamai_Andrew.pdf
accesso aperto
Dimensione
1.38 MB
Formato
Adobe PDF
|
1.38 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/55992