The use of Artificial Intelligence to solve Natural Language Processing problems: the example of BERT in a company project

Natural Language Processing (NLP) is a field that has been rapidly expanding in the past few years in the world of Artificial Intelligence (AI). It is enough to consider the recent explosion of text generation softwares, such as OpenAI’s ChatGPT, released in November 2022. The aim of this work is to explain what is NLP about and which are the most useful tasks, to provide an overview of the main machine-learning models that are utilized to solve NLP tasks, and to give an example of their application in a company project, especially focusing on the use of BERT during my internship at Ixly B.V., in Utrecht, Netherlands. Together with Ixly’s Data Science team, I had to work on their “Interview App”, which is a software that records a speech during a job interview and returns a report with the following features: the main topics that have been touched, in terms of competencies, motivators and personality keywords, a wordcloud of the most used words, the number and type of questions that have been asked, and the language style matching between the candidate and the interviewer. The “Interview App” is currently available in Dutch, so my job was to make it suitable for the Italian language. This meant that I had to find an Italian corpus of spoken text, normalize it and train my model with this new dataset. Then I had to apply all the filters mentioned above to the Italian language and finally create the pipeline that computes the language analysis for Italian conversations. This pipeline utilizes BERT to find question types within a conversation. In order to collect the information for this work, a detailed research on GoogleScholar has been made. Some main websites used for machine learning and AI have been consulted as well, such as GitHub and Hugging Face to obtain the codes to run BERT on my personal computer, the Azure API code to transcript spoken language, and the official website of OpenAI to learn about their GPT products. Python has been used as the main programming language with Visual Studio Code as its development environment. The Italian corpus KIParla has been adopted as the main dataset for the Italian language, which has been collected and released in 2019 together by the University of Bologna and Turin.

The use of Artificial Intelligence to solve Natural Language Processing problems: the example of BERT in a company project

MIGLIORE, GIULIA

2022/2023

Abstract

Natural Language Processing (NLP) is a field that has been rapidly expanding in the past few years in the world of Artificial Intelligence (AI). It is enough to consider the recent explosion of text generation softwares, such as OpenAI’s ChatGPT, released in November 2022. The aim of this work is to explain what is NLP about and which are the most useful tasks, to provide an overview of the main machine-learning models that are utilized to solve NLP tasks, and to give an example of their application in a company project, especially focusing on the use of BERT during my internship at Ixly B.V., in Utrecht, Netherlands. Together with Ixly’s Data Science team, I had to work on their “Interview App”, which is a software that records a speech during a job interview and returns a report with the following features: the main topics that have been touched, in terms of competencies, motivators and personality keywords, a wordcloud of the most used words, the number and type of questions that have been asked, and the language style matching between the candidate and the interviewer. The “Interview App” is currently available in Dutch, so my job was to make it suitable for the Italian language. This meant that I had to find an Italian corpus of spoken text, normalize it and train my model with this new dataset. Then I had to apply all the filters mentioned above to the Italian language and finally create the pipeline that computes the language analysis for Italian conversations. This pipeline utilizes BERT to find question types within a conversation. In order to collect the information for this work, a detailed research on GoogleScholar has been made. Some main websites used for machine learning and AI have been consulted as well, such as GitHub and Hugging Face to obtain the codes to run BERT on my personal computer, the Azure API code to transcript spoken language, and the official website of OpenAI to learn about their GPT products. Python has been used as the main programming language with Visual Studio Code as its development environment. The Italian corpus KIParla has been adopted as the main dataset for the Italian language, which has been collected and released in 2019 together by the University of Bologna and Turin.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Psicologia Generale - DPG
			
	Corso di studio
	
				SCIENZE PSICOLOGICHE COGNITIVE E PSICOBIOLOGICHE Laurea di Primo Livello (D.M. 270/2004)
			
	Anno Accademico
	
				2022
			
	Titolo inglese
	
				The use of Artificial Intelligence to solve Natural Language Processing problems: the example of BERT in a company project
			
	Abstract in italiano
	
				Natural Language Processing (NLP) is a field that has been rapidly expanding in the past few years in the world of Artificial Intelligence (AI). It is enough to consider the recent explosion of text generation softwares, such as OpenAI’s ChatGPT, released in November 2022. 
The aim of this work is to explain what is NLP about and which are the most useful tasks, to provide an overview of the main machine-learning models that are utilized to solve NLP tasks, and to give an example of their application in a company project, especially focusing on the use of BERT during my internship at Ixly B.V., in Utrecht, Netherlands. Together with Ixly’s Data Science team, I had to work on their “Interview App”, which is a software that records a speech during a job interview and returns a report with the following features: the main topics that have been touched, in terms of competencies, motivators and personality keywords, a wordcloud of the most used words, the number and type of questions that have been asked, and the language style matching between the candidate and the interviewer. The “Interview App” is currently available in Dutch, so my job was to make it suitable for the Italian language. This meant that I had to find an Italian corpus of spoken text, normalize it and train my model with this new dataset. Then I had to apply all the filters mentioned above to the Italian language and finally create the pipeline that computes the language analysis for Italian conversations. This pipeline utilizes BERT to find question types within a conversation.
In order to collect the information for this work, a detailed research on GoogleScholar has been made. Some main websites used for machine learning and AI have been consulted as well, such as GitHub and Hugging Face to obtain the codes to run BERT on my personal computer, the Azure API code to transcript spoken language, and the official website of OpenAI to learn about their GPT products. Python has been used as the main programming language with Visual Studio Code as its development environment. The Italian corpus KIParla has been adopted as the main dataset for the Italian language, which has been collected and released in 2019 together by the University of Bologna and Turin.
			
	Parola chiave
	
				AI
NLP
BERT
			
	Relatore
	
				ZORZI, MARCO
			
	Appare nelle tipologie:
	
				Lauree triennali

File in questo prodotto:

File	Dimensione	Formato
Migliore_Giulia.pdf accesso aperto Dimensione 840.76 kB Formato Adobe PDF Visualizza/Apri	840.76 kB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/47137