Representative Itemsets Mining: A Clustering Approach

In this thesis we study the problem of finding a good representation of a database of transactions. Previous works propose an approach that relies on a lossless compression of the database. This thesis focuses instead on a lossy compression of the database and studies a clustering approach. Given a set of transactions, the clustering model we will present tries to find the best representative itemsets by considering them as clusters. What defines the clustering model is an objective function that minimizes the number of clusters and tries to obtain the best clustering by assigning to each representative itemset some subtransactions taken from the input database. In this document we will present our algorithm and its results on synthetic datasets.

Representative Itemsets Mining: A Clustering Approach

SENO, GIACOMO

2022/2023

Abstract

In this thesis we study the problem of finding a good representation of a database of transactions. Previous works propose an approach that relies on a lossless compression of the database. This thesis focuses instead on a lossy compression of the database and studies a clustering approach. Given a set of transactions, the clustering model we will present tries to find the best representative itemsets by considering them as clusters. What defines the clustering model is an objective function that minimizes the number of clusters and tries to obtain the best clustering by assigning to each representative itemset some subtransactions taken from the input database. In this document we will present our algorithm and its results on synthetic datasets.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2022
			
	Titolo inglese
	
				Representative Itemsets Mining: A Clustering Approach
			
	Abstract in italiano
	
				In this thesis we study the problem of finding a good representation of a database of transactions. Previous works propose an approach that relies on a lossless compression of the database. This thesis focuses instead on a lossy compression of the database and studies a clustering approach. Given a set of transactions, the clustering model we will present tries to find the best representative itemsets by considering them as clusters. What defines the clustering model is an objective function that minimizes the number of clusters and tries to obtain the best clustering  by assigning to each representative itemset some subtransactions taken from the input database. In this document we will present our algorithm and its results on synthetic datasets.
			
	Parola chiave
	
				Mining
Clustering
Itemsets
Transactions
Representation
			
	Relatore
	
				VANDIN, FABIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Seno_Giacomo.pdf accesso aperto Dimensione 1.21 MB Formato Adobe PDF Visualizza/Apri	1.21 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/58022