Center-based clustering is an important yet computationally difficult primitive in the realm of unsupervised learning and data analysis. We specifically focus on the k-median clustering problem in general metric spaces, in which one seeks to find a set of k centers, so to minimise the sum of distances from each point in the dataset to its closest center. In this thesis, we present and analyze efficient techniques to deal with the k-median clustering problem in the streaming setting – where the dataset is presented one point at a time and not accessible in its entirety – by leveraging the dimensionality of the dataset’s underlying metric space.

Center-based clustering is an important yet computationally difficult primitive in the realm of unsupervised learning and data analysis. We specifically focus on the k-median clustering problem in general metric spaces, in which one seeks to find a set of k centers, so to minimise the sum of distances from each point in the dataset to its closest center. In this thesis, we present and analyze efficient techniques to deal with the k-median clustering problem in the streaming setting – where the dataset is presented one point at a time and not accessible in its entirety – by leveraging the dimensionality of the dataset’s underlying metric space.

Streaming algorithms for center-based clustering in general metrics

BADIN, LUCA
2021/2022

Abstract

Center-based clustering is an important yet computationally difficult primitive in the realm of unsupervised learning and data analysis. We specifically focus on the k-median clustering problem in general metric spaces, in which one seeks to find a set of k centers, so to minimise the sum of distances from each point in the dataset to its closest center. In this thesis, we present and analyze efficient techniques to deal with the k-median clustering problem in the streaming setting – where the dataset is presented one point at a time and not accessible in its entirety – by leveraging the dimensionality of the dataset’s underlying metric space.
2021
Streaming algorithms for center-based clustering in general metrics
Center-based clustering is an important yet computationally difficult primitive in the realm of unsupervised learning and data analysis. We specifically focus on the k-median clustering problem in general metric spaces, in which one seeks to find a set of k centers, so to minimise the sum of distances from each point in the dataset to its closest center. In this thesis, we present and analyze efficient techniques to deal with the k-median clustering problem in the streaming setting – where the dataset is presented one point at a time and not accessible in its entirety – by leveraging the dimensionality of the dataset’s underlying metric space.
streaming algorithm
k-median
clustering
facility location
doubling dimension
File in questo prodotto:
File Dimensione Formato  
Badin_Luca.pdf

accesso aperto

Dimensione 513.43 kB
Formato Adobe PDF
513.43 kB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/40246