This thesis aims to identify the most effective clustering algorithm for customer segmentation using the RFM+P (Recency, Frequency, Monetary, Product Variety) model, applied to over 1.5 million transaction records from an Italian retail company. RFM+P evaluates customer engagement through four dimensions: recency (time since the last purchase), frequency (number of transactions), monetary value (total spending), and product variety (range of unique products purchased). Each metric is standardized to ensure comparability, providing a comprehensive understanding of customer behavior. This methodology enables businesses to segment their customer base and tailor marketing strategies, optimizing resource allocation and enhancing customer retention and lifetime value. The dataset was analyzed using three clustering algorithms: K-Means, Gaussian Mixture Model (GMM), and BIRCH. Visual analysis through 3D and 2D projections revealed that K-Means produced well-separated and balanced clusters, making it the most interpretable and actionable method for marketing applications. While computationally efficient, BIRCH resulted in clusters with more overlap, reducing their distinctiveness and interpretability but with a similar result to K-Means. GMM failed to form clear customer groups, as seen in the cluster distribution plots, where most data points belonged to a single cluster, limiting its usefulness for targeted marketing. Based on these results, K-Means was chosen as the optimal algorithm, allowing the company to identify high-value frequent buyers, moderately engaged customers, and dormant clients. These insights enabled the development of targeted marketing strategies, including loyalty programs, personalized promotions, and re-engagement campaigns. This study highlights how each clustering algorithm presents unique advantages and benefits, yet K-Means demonstrated the best performance for the evaluated dataset, providing well-defined and actionable customer segments that enable more effective marketing decisions and business growth.
This thesis aims to identify the most effective clustering algorithm for customer segmentation using the RFM+P (Recency, Frequency, Monetary, Product Variety) model, applied to over 1.5 million transaction records from an Italian retail company. RFM+P evaluates customer engagement through four dimensions: recency (time since the last purchase), frequency (number of transactions), monetary value (total spending), and product variety (range of unique products purchased). Each metric is standardized to ensure comparability, providing a comprehensive understanding of customer behavior. This methodology enables businesses to segment their customer base and tailor marketing strategies, optimizing resource allocation and enhancing customer retention and lifetime value. The dataset was analyzed using three clustering algorithms: K-Means, Gaussian Mixture Model (GMM), and BIRCH. Visual analysis through 3D and 2D projections revealed that K-Means produced well-separated and balanced clusters, making it the most interpretable and actionable method for marketing applications. While computationally efficient, BIRCH resulted in clusters with more overlap, reducing their distinctiveness and interpretability but with a similar result to K-Means. GMM failed to form clear customer groups, as seen in the cluster distribution plots, where most data points belonged to a single cluster, limiting its usefulness for targeted marketing. Based on these results, K-Means was chosen as the optimal algorithm, allowing the company to identify high-value frequent buyers, moderately engaged customers, and dormant clients. These insights enabled the development of targeted marketing strategies, including loyalty programs, personalized promotions, and re-engagement campaigns. This study highlights how each clustering algorithm presents unique advantages and benefits, yet K-Means demonstrated the best performance for the evaluated dataset, providing well-defined and actionable customer segments that enable more effective marketing decisions and business growth.
Customer Segmentation Using Clustering Algorithms: A comparative study for an Italian Fashion Retail Company.
CIFUENTES BOHORQUEZ, SULY VANNESA
2024/2025
Abstract
This thesis aims to identify the most effective clustering algorithm for customer segmentation using the RFM+P (Recency, Frequency, Monetary, Product Variety) model, applied to over 1.5 million transaction records from an Italian retail company. RFM+P evaluates customer engagement through four dimensions: recency (time since the last purchase), frequency (number of transactions), monetary value (total spending), and product variety (range of unique products purchased). Each metric is standardized to ensure comparability, providing a comprehensive understanding of customer behavior. This methodology enables businesses to segment their customer base and tailor marketing strategies, optimizing resource allocation and enhancing customer retention and lifetime value. The dataset was analyzed using three clustering algorithms: K-Means, Gaussian Mixture Model (GMM), and BIRCH. Visual analysis through 3D and 2D projections revealed that K-Means produced well-separated and balanced clusters, making it the most interpretable and actionable method for marketing applications. While computationally efficient, BIRCH resulted in clusters with more overlap, reducing their distinctiveness and interpretability but with a similar result to K-Means. GMM failed to form clear customer groups, as seen in the cluster distribution plots, where most data points belonged to a single cluster, limiting its usefulness for targeted marketing. Based on these results, K-Means was chosen as the optimal algorithm, allowing the company to identify high-value frequent buyers, moderately engaged customers, and dormant clients. These insights enabled the development of targeted marketing strategies, including loyalty programs, personalized promotions, and re-engagement campaigns. This study highlights how each clustering algorithm presents unique advantages and benefits, yet K-Means demonstrated the best performance for the evaluated dataset, providing well-defined and actionable customer segments that enable more effective marketing decisions and business growth.File | Dimensione | Formato | |
---|---|---|---|
CifuentesBohorquez_SulyVannesa.pdf
accesso riservato
Dimensione
2.68 MB
Formato
Adobe PDF
|
2.68 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/82084