In recent years, Cluster Analysis was shown to be a practical and effective Machine Learning tool for data exploration. The Text Mining field has been particularly affected by numerous researches using this technique, focused on analyzing the large amounts of unstructured text data available in this context. The general approaches proposed in the literature are nonetheless intended for a setting where full, coherent documents are available for interpretation and comparison. Moreover, they are often limited by the techniques accessible at the time of their publication. Thus, the swift evolution of Natural Language Processing provides great opportunities to devise novel and improved approaches. This work summarizes the results of the research project carried out during an internship in Amazon Ads, where Cluster Analysis has been used as a tool to explore datasets of search queries, which consist of short sentences targeted to a search engine and, consequently, of dubious grammatical and syntactic correctness. The primary aim of the project is to identify a reproducible procedure to unveil search intentions in Amazon search data, which could be used to produce actionable insights and recommendations for advertisers based on the discovered patterns. The approach defined for this operation consists in applying distinct clustering techniques on sentence embeddings of searches in different thematic contexts. The resulting clusters are first evaluated at a technical level via clustering-specific metrics. Then, a Topic Modelling structure is built on top of the best clusterings and evaluated via human judgment to consider the applicability of the results in the business context. The experiments reported in this thesis are focused on the Italian marketplace, with the potential for expansion to multiple environments and regions.

In recent years, Cluster Analysis was shown to be a practical and effective Machine Learning tool for data exploration. The Text Mining field has been particularly affected by numerous researches using this technique, focused on analyzing the large amounts of unstructured text data available in this context. The general approaches proposed in the literature are nonetheless intended for a setting where full, coherent documents are available for interpretation and comparison. Moreover, they are often limited by the techniques accessible at the time of their publication. Thus, the swift evolution of Natural Language Processing provides great opportunities to devise novel and improved approaches. This work summarizes the results of the research project carried out during an internship in Amazon Ads, where Cluster Analysis has been used as a tool to explore datasets of search queries, which consist of short sentences targeted to a search engine and, consequently, of dubious grammatical and syntactic correctness. The primary aim of the project is to identify a reproducible procedure to unveil search intentions in Amazon search data, which could be used to produce actionable insights and recommendations for advertisers based on the discovered patterns. The approach defined for this operation consists in applying distinct clustering techniques on sentence embeddings of searches in different thematic contexts. The resulting clusters are first evaluated at a technical level via clustering-specific metrics. Then, a Topic Modelling structure is built on top of the best clusterings and evaluated via human judgment to consider the applicability of the results in the business context. The experiments reported in this thesis are focused on the Italian marketplace, with the potential for expansion to multiple environments and regions.

Comparison of Clustering Techniques on Amazon Search Queries with Sentence Embeddings

ZILIO, THOMAS
2021/2022

Abstract

In recent years, Cluster Analysis was shown to be a practical and effective Machine Learning tool for data exploration. The Text Mining field has been particularly affected by numerous researches using this technique, focused on analyzing the large amounts of unstructured text data available in this context. The general approaches proposed in the literature are nonetheless intended for a setting where full, coherent documents are available for interpretation and comparison. Moreover, they are often limited by the techniques accessible at the time of their publication. Thus, the swift evolution of Natural Language Processing provides great opportunities to devise novel and improved approaches. This work summarizes the results of the research project carried out during an internship in Amazon Ads, where Cluster Analysis has been used as a tool to explore datasets of search queries, which consist of short sentences targeted to a search engine and, consequently, of dubious grammatical and syntactic correctness. The primary aim of the project is to identify a reproducible procedure to unveil search intentions in Amazon search data, which could be used to produce actionable insights and recommendations for advertisers based on the discovered patterns. The approach defined for this operation consists in applying distinct clustering techniques on sentence embeddings of searches in different thematic contexts. The resulting clusters are first evaluated at a technical level via clustering-specific metrics. Then, a Topic Modelling structure is built on top of the best clusterings and evaluated via human judgment to consider the applicability of the results in the business context. The experiments reported in this thesis are focused on the Italian marketplace, with the potential for expansion to multiple environments and regions.
2021
Comparison of Clustering Techniques on Amazon Search Queries with Sentence Embeddings
In recent years, Cluster Analysis was shown to be a practical and effective Machine Learning tool for data exploration. The Text Mining field has been particularly affected by numerous researches using this technique, focused on analyzing the large amounts of unstructured text data available in this context. The general approaches proposed in the literature are nonetheless intended for a setting where full, coherent documents are available for interpretation and comparison. Moreover, they are often limited by the techniques accessible at the time of their publication. Thus, the swift evolution of Natural Language Processing provides great opportunities to devise novel and improved approaches. This work summarizes the results of the research project carried out during an internship in Amazon Ads, where Cluster Analysis has been used as a tool to explore datasets of search queries, which consist of short sentences targeted to a search engine and, consequently, of dubious grammatical and syntactic correctness. The primary aim of the project is to identify a reproducible procedure to unveil search intentions in Amazon search data, which could be used to produce actionable insights and recommendations for advertisers based on the discovered patterns. The approach defined for this operation consists in applying distinct clustering techniques on sentence embeddings of searches in different thematic contexts. The resulting clusters are first evaluated at a technical level via clustering-specific metrics. Then, a Topic Modelling structure is built on top of the best clusterings and evaluated via human judgment to consider the applicability of the results in the business context. The experiments reported in this thesis are focused on the Italian marketplace, with the potential for expansion to multiple environments and regions.
search queries
cluster analysis
sentence embeddings
tf-idf
File in questo prodotto:
File Dimensione Formato  
Zilio_Thomas.pdf

accesso riservato

Dimensione 11.65 MB
Formato Adobe PDF
11.65 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/36689