Enhancing Customer Claims Analysis through Clustering and Textual Data Insights

Nowadays the total amount of analyzable customer claims and after-sales requests is increasing so rapidly that companies can no longer handle them by hand. Reading, analyzing and classifying them is time consuming. Such tasks can be carried out by means of an Artificial Intelligence (AI) model, taking advantage of its improvements in Natural Language Processing (NLP) and text mining. Based on these model results, a clustering technique can be applied to further optimize the claims handling. From this perspective, textual data mining would help their work by simplifying the original dataset and making clustering more understandable. Indeed, the main goal of this work is to improve customer claim analysis by enabling the automatic detection of issues, clustering similar claims together into shared topics. Keywords play an important role in this operation since summarizing could help to focus the attention of the user to sort the claim to the competent company technical team. Moreover, it helps to catch the latent issue before it becomes difficult to manage. This is achieved by grouping claims based on common issues among different products, or the same product with the same issue across various customers and field applications. The process involves extracting significant words from the claims and identifying similarities through keywords comparison. Keywords were extracted using an Large Language Model (LLM) model based on a fine-tuned Bidirectional Encoder Representations from Transformers (BART) model, specifically designed to identify technical terms in texts. Prior to application, the model was fine-tuned with ground truth data to better align with the project’s objectives. To make the keyword vectors manageable, they were converted into numerical vectors and then reduced in dimensionality. After these intermediate operations, a set of unsupervised and semi-supervised clustering techniques was evaluated. Initially, different agglomerative and density-based clustering methods were applied in an unsupervised manner. Later, a semi-supervised approach was employed using k-means clustering, with the optimal number of clusters (k) determined automatically by maximizing a weighted version of the silhouette score. For the given dataset and ground truth, the fine-tuned model used for keyword extraction achieved a F1-score of more than 71%, while the standard model achieved just 20%, meaning that the fine-tuned model has a significantly better balance between precision and recall when extracting the desired technical keywords. In this context, the most effective clustering technique was the semi-supervised k-means algorithm. It dynamically selected between 22 and 27 clusters based on the knee point of the silhouette score. The work of service employees can be fundamentally supported by the actual project, improving claim insights and resolution efficiency by delivering claims directly to the appropriate team.

Enhancing Customer Claims Analysis through Clustering and Textual Data Insights

FRISO, GIOVANNI

2023/2024

Abstract

Nowadays the total amount of analyzable customer claims and after-sales requests is increasing so rapidly that companies can no longer handle them by hand. Reading, analyzing and classifying them is time consuming. Such tasks can be carried out by means of an Artificial Intelligence (AI) model, taking advantage of its improvements in Natural Language Processing (NLP) and text mining. Based on these model results, a clustering technique can be applied to further optimize the claims handling. From this perspective, textual data mining would help their work by simplifying the original dataset and making clustering more understandable. Indeed, the main goal of this work is to improve customer claim analysis by enabling the automatic detection of issues, clustering similar claims together into shared topics. Keywords play an important role in this operation since summarizing could help to focus the attention of the user to sort the claim to the competent company technical team. Moreover, it helps to catch the latent issue before it becomes difficult to manage. This is achieved by grouping claims based on common issues among different products, or the same product with the same issue across various customers and field applications. The process involves extracting significant words from the claims and identifying similarities through keywords comparison. Keywords were extracted using an Large Language Model (LLM) model based on a fine-tuned Bidirectional Encoder Representations from Transformers (BART) model, specifically designed to identify technical terms in texts. Prior to application, the model was fine-tuned with ground truth data to better align with the project’s objectives. To make the keyword vectors manageable, they were converted into numerical vectors and then reduced in dimensionality. After these intermediate operations, a set of unsupervised and semi-supervised clustering techniques was evaluated. Initially, different agglomerative and density-based clustering methods were applied in an unsupervised manner. Later, a semi-supervised approach was employed using k-means clustering, with the optimal number of clusters (k) determined automatically by maximizing a weighted version of the silhouette score. For the given dataset and ground truth, the fine-tuned model used for keyword extraction achieved a F1-score of more than 71%, while the standard model achieved just 20%, meaning that the fine-tuned model has a significantly better balance between precision and recall when extracting the desired technical keywords. In this context, the most effective clustering technique was the semi-supervised k-means algorithm. It dynamically selected between 22 and 27 clusters based on the knee point of the silhouette score. The work of service employees can be fundamentally supported by the actual project, improving claim insights and resolution efficiency by delivering claims directly to the appropriate team.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Enhancing Customer Claims Analysis through Clustering and Textual Data Insights
			
	Parola chiave
	
				Keyword Extraction
Clustering
NLP
LLM
Claim Resolution
			
	Relatore
	
				BERTOCCO, MATTEO
			
	Correlatore
	
				BODO, ROBERTO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Friso_Giovanni.pdf accesso riservato Dimensione 2.71 MB Formato Adobe PDF	2.71 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/77006