Intelligent System Design for Advanced Anomaly Detection in Structured Datasets

Anomaly detection in structured datasets plays a pivotal role across a wide array of application domains, often serving as a critical tool for identifying significant, actionable insights. The primary objective of this thesis is the construction of an innovative automatic pipeline, specifically designed to enhance the process of anomaly detection in structured datasets. Central to our approach is the integration of both unsupervised and supervised learning methodologies. To address the challenge of lacking a direct metric able for evaluating unsupervised models, we introduce a unique method that employs pseudo-labeling to assess the performance of unsupervised anomaly detection models effectively. Our work outlines the architecture and operation of the proposed pipeline, detailing how it adapts to different types of input datasets, labeled or unlabeled, and the various transformations and algorithms it employs. In our extensive experiments and analyses, we show that while the pipeline may not always pinpoint the absolute best model for every dataset, it consistently identifies models that perform better than those with default parameters. Crucially, it achieves this with notable efficiency, saving significant time by circumventing the need for extensive and costly exploratory data analysis and hyperparameter tuning. This balance of performance and speed marks a substantial improvement over existing methods in the domain of anomaly detection.

Intelligent System Design for Advanced Anomaly Detection in Structured Datasets

ZANATTA, MICHELE

2022/2023

Abstract

Anomaly detection in structured datasets plays a pivotal role across a wide array of application domains, often serving as a critical tool for identifying significant, actionable insights. The primary objective of this thesis is the construction of an innovative automatic pipeline, specifically designed to enhance the process of anomaly detection in structured datasets. Central to our approach is the integration of both unsupervised and supervised learning methodologies. To address the challenge of lacking a direct metric able for evaluating unsupervised models, we introduce a unique method that employs pseudo-labeling to assess the performance of unsupervised anomaly detection models effectively. Our work outlines the architecture and operation of the proposed pipeline, detailing how it adapts to different types of input datasets, labeled or unlabeled, and the various transformations and algorithms it employs. In our extensive experiments and analyses, we show that while the pipeline may not always pinpoint the absolute best model for every dataset, it consistently identifies models that perform better than those with default parameters. Crucially, it achieves this with notable efficiency, saving significant time by circumventing the need for extensive and costly exploratory data analysis and hyperparameter tuning. This balance of performance and speed marks a substantial improvement over existing methods in the domain of anomaly detection.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				DATA SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2022
			
	Titolo inglese
	
				Intelligent System Design for Advanced Anomaly Detection in Structured Datasets
			
	Parola chiave
	
				Anomaly Detection
Structured Datasets
Automated Pipeline
			
	Relatore
	
				ERSEGHE, TOMASO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Zanatta_DataScience_Thesis.pdf accesso riservato Dimensione 1.24 MB Formato Adobe PDF	1.24 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/61401