In this thesis we study the problem of finding a good representation of a database of transactions. Previous works propose an approach that relies on a lossless compression of the database. This thesis focuses instead on a lossy compression of the database and studies a clustering approach. Given a set of transactions, the clustering model we will present tries to find the best representative itemsets by considering them as clusters. What defines the clustering model is an objective function that minimizes the number of clusters and tries to obtain the best clustering by assigning to each representative itemset some subtransactions taken from the input database. In this document we will present our algorithm and its results on synthetic datasets.
In this thesis we study the problem of finding a good representation of a database of transactions. Previous works propose an approach that relies on a lossless compression of the database. This thesis focuses instead on a lossy compression of the database and studies a clustering approach. Given a set of transactions, the clustering model we will present tries to find the best representative itemsets by considering them as clusters. What defines the clustering model is an objective function that minimizes the number of clusters and tries to obtain the best clustering by assigning to each representative itemset some subtransactions taken from the input database. In this document we will present our algorithm and its results on synthetic datasets.
Representative Itemsets Mining: A Clustering Approach
SENO, GIACOMO
2022/2023
Abstract
In this thesis we study the problem of finding a good representation of a database of transactions. Previous works propose an approach that relies on a lossless compression of the database. This thesis focuses instead on a lossy compression of the database and studies a clustering approach. Given a set of transactions, the clustering model we will present tries to find the best representative itemsets by considering them as clusters. What defines the clustering model is an objective function that minimizes the number of clusters and tries to obtain the best clustering by assigning to each representative itemset some subtransactions taken from the input database. In this document we will present our algorithm and its results on synthetic datasets.File | Dimensione | Formato | |
---|---|---|---|
Seno_Giacomo.pdf
accesso aperto
Dimensione
1.21 MB
Formato
Adobe PDF
|
1.21 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/58022