In this thesis we study the problem of finding a good representation of a database of transactions. Previous works propose an approach that relies on a lossless compression of the database. This thesis focuses instead on a lossy compression of the database and studies a clustering approach. Given a set of transactions, the clustering model we will present tries to find the best representative itemsets by considering them as clusters. What defines the clustering model is an objective function that minimizes the number of clusters and tries to obtain the best clustering by assigning to each representative itemset some subtransactions taken from the input database. In this document we will present our algorithm and its results on synthetic datasets.

In this thesis we study the problem of finding a good representation of a database of transactions. Previous works propose an approach that relies on a lossless compression of the database. This thesis focuses instead on a lossy compression of the database and studies a clustering approach. Given a set of transactions, the clustering model we will present tries to find the best representative itemsets by considering them as clusters. What defines the clustering model is an objective function that minimizes the number of clusters and tries to obtain the best clustering by assigning to each representative itemset some subtransactions taken from the input database. In this document we will present our algorithm and its results on synthetic datasets.

Representative Itemsets Mining: A Clustering Approach

SENO, GIACOMO
2022/2023

Abstract

In this thesis we study the problem of finding a good representation of a database of transactions. Previous works propose an approach that relies on a lossless compression of the database. This thesis focuses instead on a lossy compression of the database and studies a clustering approach. Given a set of transactions, the clustering model we will present tries to find the best representative itemsets by considering them as clusters. What defines the clustering model is an objective function that minimizes the number of clusters and tries to obtain the best clustering by assigning to each representative itemset some subtransactions taken from the input database. In this document we will present our algorithm and its results on synthetic datasets.
2022
Representative Itemsets Mining: A Clustering Approach
In this thesis we study the problem of finding a good representation of a database of transactions. Previous works propose an approach that relies on a lossless compression of the database. This thesis focuses instead on a lossy compression of the database and studies a clustering approach. Given a set of transactions, the clustering model we will present tries to find the best representative itemsets by considering them as clusters. What defines the clustering model is an objective function that minimizes the number of clusters and tries to obtain the best clustering by assigning to each representative itemset some subtransactions taken from the input database. In this document we will present our algorithm and its results on synthetic datasets.
Mining
Clustering
Itemsets
Transactions
Representation
File in questo prodotto:
File Dimensione Formato  
Seno_Giacomo.pdf

accesso aperto

Dimensione 1.21 MB
Formato Adobe PDF
1.21 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/58022