Center-based clustering is a fundamental primitive for data analysis and is very challenging for large datasets. We developed coreset based space/round-efficient MapReduce algorithms to solve the k-center, k-median, and k-means variants in general metrics. Remarkably, the algorithms obliviously adapt to the doubling dimension of the metric space, and attain approximation ratios that can be made arbitrarily close to those achievable by the best known polynomial-time sequential approximations.

Distributed Clustering in General Metrics via Coresets

Mazzetto, Alessio
2019/2020

Abstract

Center-based clustering is a fundamental primitive for data analysis and is very challenging for large datasets. We developed coreset based space/round-efficient MapReduce algorithms to solve the k-center, k-median, and k-means variants in general metrics. Remarkably, the algorithms obliviously adapt to the doubling dimension of the metric space, and attain approximation ratios that can be made arbitrarily close to those achievable by the best known polynomial-time sequential approximations.
2019-07-09
clustering, coreset, doubling dimension, MapReduce​
File in questo prodotto:
File Dimensione Formato  
thesis.pdf

accesso aperto

Dimensione 832.25 kB
Formato Adobe PDF
832.25 kB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/28042