Anomaly detection in structured datasets plays a pivotal role across a wide array of application domains, often serving as a critical tool for identifying significant, actionable insights. The primary objective of this thesis is the construction of an innovative automatic pipeline, specifically designed to enhance the process of anomaly detection in structured datasets. Central to our approach is the integration of both unsupervised and supervised learning methodologies. To address the challenge of lacking a direct metric able for evaluating unsupervised models, we introduce a unique method that employs pseudo-labeling to assess the performance of unsupervised anomaly detection models effectively. Our work outlines the architecture and operation of the proposed pipeline, detailing how it adapts to different types of input datasets, labeled or unlabeled, and the various transformations and algorithms it employs. In our extensive experiments and analyses, we show that while the pipeline may not always pinpoint the absolute best model for every dataset, it consistently identifies models that perform better than those with default parameters. Crucially, it achieves this with notable efficiency, saving significant time by circumventing the need for extensive and costly exploratory data analysis and hyperparameter tuning. This balance of performance and speed marks a substantial improvement over existing methods in the domain of anomaly detection.
Intelligent System Design for Advanced Anomaly Detection in Structured Datasets
ZANATTA, MICHELE
2022/2023
Abstract
Anomaly detection in structured datasets plays a pivotal role across a wide array of application domains, often serving as a critical tool for identifying significant, actionable insights. The primary objective of this thesis is the construction of an innovative automatic pipeline, specifically designed to enhance the process of anomaly detection in structured datasets. Central to our approach is the integration of both unsupervised and supervised learning methodologies. To address the challenge of lacking a direct metric able for evaluating unsupervised models, we introduce a unique method that employs pseudo-labeling to assess the performance of unsupervised anomaly detection models effectively. Our work outlines the architecture and operation of the proposed pipeline, detailing how it adapts to different types of input datasets, labeled or unlabeled, and the various transformations and algorithms it employs. In our extensive experiments and analyses, we show that while the pipeline may not always pinpoint the absolute best model for every dataset, it consistently identifies models that perform better than those with default parameters. Crucially, it achieves this with notable efficiency, saving significant time by circumventing the need for extensive and costly exploratory data analysis and hyperparameter tuning. This balance of performance and speed marks a substantial improvement over existing methods in the domain of anomaly detection.File | Dimensione | Formato | |
---|---|---|---|
Zanatta_DataScience_Thesis.pdf
accesso riservato
Dimensione
1.24 MB
Formato
Adobe PDF
|
1.24 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/61401