Stima non parametrica della densità per il clustering

The rationale behind the modal formulation of the clustering problem is rooted in the idea of transforming the intuitive concept of clusters as dense sets of observations into a rigorous framework, where the notion of a group is directly tied to specific features of the probability density function underlying the data. Broadly speaking, a one-to-one correspondence can be established between clusters and the modal regions of the density, with the modes serving as archetypes of the clusters. The density function is typically estimated using nonparametric methods, most commonly the kernel density estimator. Population groups are then identified as the domains of attraction of the modes through a gradient ascent algorithm. The aim of this thesis is to explore a modification of the kernel density estimator, where the smoothing parameters are specific to each group. The selection of these parameters and the identification of the groups are carried out simultaneously through an iterative algorithm. Relevant applications and examples are presented to illustrate the method.

L'approccio modale al problema di clustering si fonda sull'idea di identificare il concetto intuitivo di cluster come insieme denso di osservazioni in un quadro rigoroso, in cui la nozione di gruppo è direttamente legata a specifiche caratteristiche della funzione di densità di probabilità sottostante ai dati. In termini generali, è possibile stabilire una corrispondenza tra i cluster e le regioni modali della densità, con le mode che fungono da archetipi dei cluster. La funzione di densità viene solitamente stimata attraverso metodi non parametrici, in particolare mediante lo stimatore kernel. I gruppi della popolazione vengono quindi identificati come i domini di attrazione delle mode utilizzando un algoritmo di risalita del gradiente. L'obiettivo di questa tesi è quello di proporre una modifica dello stimatore di densità kernel, in cui i parametri di smoothing sono specifici per ciascun gruppo. La selezione di tali parametri e l'identificazione dei gruppi vengono effettuate simultaneamente attraverso un algoritmo iterativo. Vengono presentate applicazioni rilevanti ed esempi per illustrare il metodo.

Stima non parametrica della densità per il clustering

ZANATTA, RAUL

2024/2025

Abstract

The rationale behind the modal formulation of the clustering problem is rooted in the idea of transforming the intuitive concept of clusters as dense sets of observations into a rigorous framework, where the notion of a group is directly tied to specific features of the probability density function underlying the data. Broadly speaking, a one-to-one correspondence can be established between clusters and the modal regions of the density, with the modes serving as archetypes of the clusters. The density function is typically estimated using nonparametric methods, most commonly the kernel density estimator. Population groups are then identified as the domains of attraction of the modes through a gradient ascent algorithm. The aim of this thesis is to explore a modification of the kernel density estimator, where the smoothing parameters are specific to each group. The selection of these parameters and the identification of the groups are carried out simultaneously through an iterative algorithm. Relevant applications and examples are presented to illustrate the method.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Scienze Statistiche
			
	Corso di studio
	
				SCIENZE STATISTICHE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Nonparametric density estimation for clustering
			
	Abstract in italiano
	
				L'approccio modale al problema di clustering si fonda sull'idea di identificare il concetto intuitivo di cluster come insieme denso di osservazioni in un quadro rigoroso, in cui la nozione di gruppo è direttamente legata a specifiche caratteristiche della funzione di densità di probabilità sottostante ai dati. In termini generali, è possibile stabilire una corrispondenza tra i cluster e le regioni modali della densità, con le mode che fungono da archetipi dei cluster. La funzione di densità viene solitamente stimata attraverso metodi non parametrici, in particolare mediante lo stimatore kernel. I gruppi della popolazione vengono quindi identificati come i domini di attrazione delle mode utilizzando un algoritmo di risalita del gradiente.
L'obiettivo di questa tesi è quello di proporre una modifica dello stimatore di densità kernel, in cui i parametri di smoothing sono specifici per ciascun gruppo. La selezione di tali parametri e l'identificazione dei gruppi vengono effettuate simultaneamente attraverso un algoritmo iterativo. Vengono presentate applicazioni rilevanti ed esempi per illustrare il metodo.
			
	Parola chiave
	
				Clustering
Nonparametric
Density estimation
			
	Relatore
	
				MENARDI, GIOVANNA
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Zanatta_Raul.pdf Accesso riservato Dimensione 2.57 MB Formato Adobe PDF	2.57 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/84100