Multi-Span Extractive Question Answering for Named Entity Recognition

Named Entity Recognition (NER) is a Natural Language Processing (NLP) task that involves detecting and categorizing named entities in a text. Named entities can be names of people, organizations, locations and dates or can be specifically defined for the domain in which the NER task is adopted. NER proves helpful in a variety of applications, the most notable of which is Information Extraction, where NER allows the extraction of structured information from unstructured text. Plenty of approaches have been exploited, ranging from rule-based methods to machine-learning algorithms such as Conditional Random Fields and Hidden Markov Models, to deep-learning systems based on Recurrent Neural Networks and Transformer-based architectures. Lately, with the advent of Large Language Models (LLMs) such as GPT and in-context learning techniques, a new approach to NER has emerged: extracting named entities by posing questions and having the LLM fill in the answer with the named entities (one or more) to extract. These LLMs, employed as Generative Question Answering (GQA) models, still require careful prompt-engineering, both for dealing with inputs not fitting into the context-window, and for producing well-formatted output. If a traditional NER system detects several text spans associated with a named entity, the generative QA model should be prompted to extract the same number of text spans. Furthermore, if a named entity does not appear, the generative LLM used as QA should not produce non-existent content. Based on the above considerations, NER approached as a QA task might be a viable option, although an Extractive QA method may be more appropriate over a generative one. This thesis delves into the Extractive QA approach to NER, providing a comprehensive exploration of its core concepts and methodologies. In particular, this work will first present the operating principle of single-span EQA models, which can only extract a single span of text for each question, before moving on to the design and development of a new transformer-based model that enables Multi-Span Extractive QA, implemented according to specific guidelines so that it is a natural extension of the single-span operating principle. The proposed model is then evaluated on a NER dataset named BUSTER, where specific named entities from business transaction documents have to be extracted. Experiments and comparisons with other transformer-based NER systems, which constitute the baselines for the classical NER approach, allow to show the equal power of this Multi-span EQA approach to NER. Further investigative experiments reveal that the QA approach to NER through the proposed model implementation achieves better results in reduced training set scenarios. Future work will focus on trying to develop a technique to make the model perform Continual Learning on a sequence of NER datasets while retaining its capability to correctly respond to questions from all previously encountered datasets.

Multi-Span Extractive Question Answering for Named Entity Recognition

ZAMAI, ANDREW

2022/2023

Abstract

Named Entity Recognition (NER) is a Natural Language Processing (NLP) task that involves detecting and categorizing named entities in a text. Named entities can be names of people, organizations, locations and dates or can be specifically defined for the domain in which the NER task is adopted. NER proves helpful in a variety of applications, the most notable of which is Information Extraction, where NER allows the extraction of structured information from unstructured text. Plenty of approaches have been exploited, ranging from rule-based methods to machine-learning algorithms such as Conditional Random Fields and Hidden Markov Models, to deep-learning systems based on Recurrent Neural Networks and Transformer-based architectures. Lately, with the advent of Large Language Models (LLMs) such as GPT and in-context learning techniques, a new approach to NER has emerged: extracting named entities by posing questions and having the LLM fill in the answer with the named entities (one or more) to extract. These LLMs, employed as Generative Question Answering (GQA) models, still require careful prompt-engineering, both for dealing with inputs not fitting into the context-window, and for producing well-formatted output. If a traditional NER system detects several text spans associated with a named entity, the generative QA model should be prompted to extract the same number of text spans. Furthermore, if a named entity does not appear, the generative LLM used as QA should not produce non-existent content. Based on the above considerations, NER approached as a QA task might be a viable option, although an Extractive QA method may be more appropriate over a generative one. This thesis delves into the Extractive QA approach to NER, providing a comprehensive exploration of its core concepts and methodologies. In particular, this work will first present the operating principle of single-span EQA models, which can only extract a single span of text for each question, before moving on to the design and development of a new transformer-based model that enables Multi-Span Extractive QA, implemented according to specific guidelines so that it is a natural extension of the single-span operating principle. The proposed model is then evaluated on a NER dataset named BUSTER, where specific named entities from business transaction documents have to be extracted. Experiments and comparisons with other transformer-based NER systems, which constitute the baselines for the classical NER approach, allow to show the equal power of this Multi-span EQA approach to NER. Further investigative experiments reveal that the QA approach to NER through the proposed model implementation achieves better results in reduced training set scenarios. Future work will focus on trying to develop a technique to make the model perform Continual Learning on a sequence of NER datasets while retaining its capability to correctly respond to questions from all previously encountered datasets.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2022
			
	Titolo inglese
	
				Multi-Span Extractive Question Answering for Named Entity Recognition
			
	Abstract in italiano
	
				Named Entity Recognition (NER) is a Natural Language Processing (NLP) task that involves detecting and categorizing named entities in a text. Named entities can be names of people, organizations, locations and dates or can be specifically defined for the domain in which the NER task is adopted. NER proves helpful in a variety of applications, the most notable of which is Information Extraction, where NER allows the extraction of structured information from unstructured text. Plenty of approaches have been exploited, ranging from rule-based methods to machine-learning algorithms such as Conditional Random Fields and Hidden Markov Models, to deep-learning systems based on Recurrent Neural Networks and Transformer-based architectures. 
Lately, with the advent of Large Language Models (LLMs) such as GPT and in-context learning techniques, a new approach to NER has emerged: extracting named entities by posing questions and having the LLM fill in the answer with the named entities (one or more) to extract. These LLMs, employed as Generative Question Answering (GQA) models, still require careful prompt-engineering, both for dealing with inputs not fitting into the context-window, and for producing well-formatted output. If a traditional NER system detects several text spans associated with a named entity, the generative QA model should be prompted to extract the same number of text spans. Furthermore, if a named entity does not appear, the generative LLM used as QA should not produce non-existent content. 
Based on the above considerations, NER approached as a QA task might be a viable option, although an Extractive QA method may be more appropriate over a generative one. This thesis delves into the Extractive QA approach to NER, providing a comprehensive exploration of its core concepts and methodologies. In particular, this work will first present the operating principle of single-span EQA models, which can only extract a single span of text for each question, before moving on to the design and development of a new transformer-based model that enables Multi-Span Extractive QA, implemented according to specific guidelines so that it is a natural extension of the single-span operating principle. The proposed model is then evaluated on a NER dataset named BUSTER, where specific named entities from business transaction documents have to be extracted. 
Experiments and comparisons with other transformer-based NER systems, which constitute the baselines for the classical NER approach, allow to show the equal power of this Multi-span EQA approach to NER. Further investigative experiments reveal that the QA approach to NER through the proposed model implementation achieves better results in reduced training set scenarios. Future work will focus on trying to develop a technique to make the model perform Continual Learning on a sequence of NER datasets while retaining its capability to correctly respond to questions from all previously encountered datasets.
			
	Parola chiave
	
				Multi-Span
Extractive QA
NER
LLM
			
	Relatore
	
				SATTA, GIORGIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Zamai_Andrew.pdf accesso aperto Dimensione 1.38 MB Formato Adobe PDF Visualizza/Apri	1.38 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/55992