Anomaly detection in structured datasets plays a pivotal role across a wide array of application domains, often serving as a critical tool for identifying significant, actionable insights. The primary objective of this thesis is the construction of an innovative automatic pipeline, specifically designed to enhance the process of anomaly detection in structured datasets. Central to our approach is the integration of both unsupervised and supervised learning methodologies. To address the challenge of lacking a direct metric able for evaluating unsupervised models, we introduce a unique method that employs pseudo-labeling to assess the performance of unsupervised anomaly detection models effectively. Our work outlines the architecture and operation of the proposed pipeline, detailing how it adapts to different types of input datasets, labeled or unlabeled, and the various transformations and algorithms it employs. In our extensive experiments and analyses, we show that while the pipeline may not always pinpoint the absolute best model for every dataset, it consistently identifies models that perform better than those with default parameters. Crucially, it achieves this with notable efficiency, saving significant time by circumventing the need for extensive and costly exploratory data analysis and hyperparameter tuning. This balance of performance and speed marks a substantial improvement over existing methods in the domain of anomaly detection.

Intelligent System Design for Advanced Anomaly Detection in Structured Datasets

ZANATTA, MICHELE
2022/2023

Abstract

Anomaly detection in structured datasets plays a pivotal role across a wide array of application domains, often serving as a critical tool for identifying significant, actionable insights. The primary objective of this thesis is the construction of an innovative automatic pipeline, specifically designed to enhance the process of anomaly detection in structured datasets. Central to our approach is the integration of both unsupervised and supervised learning methodologies. To address the challenge of lacking a direct metric able for evaluating unsupervised models, we introduce a unique method that employs pseudo-labeling to assess the performance of unsupervised anomaly detection models effectively. Our work outlines the architecture and operation of the proposed pipeline, detailing how it adapts to different types of input datasets, labeled or unlabeled, and the various transformations and algorithms it employs. In our extensive experiments and analyses, we show that while the pipeline may not always pinpoint the absolute best model for every dataset, it consistently identifies models that perform better than those with default parameters. Crucially, it achieves this with notable efficiency, saving significant time by circumventing the need for extensive and costly exploratory data analysis and hyperparameter tuning. This balance of performance and speed marks a substantial improvement over existing methods in the domain of anomaly detection.
2022
Intelligent System Design for Advanced Anomaly Detection in Structured Datasets
Anomaly Detection
Structured Datasets
Automated Pipeline
File in questo prodotto:
File Dimensione Formato  
Zanatta_DataScience_Thesis.pdf

accesso riservato

Dimensione 1.24 MB
Formato Adobe PDF
1.24 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/61401