In the realm of data-intensive fields, obtaining accurate labels becomes increasingly challenging, particularly in Anomaly Detection, where anomalies are context-dependent and difficult to define. This thesis addresses these challenges by integrating Active Learning techniques with the Isolation Forest algorithm to refine unsupervised Anomaly Detection and align it closely with user-specific anomaly definitions. The research focuses on detecting anomalies in an unlabeled industrial dataset using Active Learning Isolation Forest (ALIF) and incorporating Bayesian inference techniques (B-ALIF) to enhance ALIF’s heuristic approaches. The primary goal is to investigate the impact of these active labeling techniques on anomaly scores and to compare their practicality and effectiveness in real-world industrial settings. We implement HDBSCAN and Isolation Forest to detect potential anomalies, followed by the introduction of Active Learning techniques to incorporate domain expert feedback. By utilizing UMAP visualizations and the AcME algorithm for local explainability, we analyze the evolution of anomaly scores and feature importance across updates. Importantly, we observe that each model identifies different types of anomalies, driven by the way they function, underscoring the need for expert feedback in Anomaly Detection. The results demonstrate that while ALIF aggressively adapts to labeled data, B-ALIF provides better adaptability and control through the offset parameter, ensuring that the prior model’s structure is not entirely discarded. When expert labels align with prior predictions, B-ALIF reinforces them; when they differ, B-ALIF adjusts the model to align with the expert's definition of anomalies without discarding prior knowledge. This research shows that Active Learning, particularly through B-ALIF, offers a more robust and balanced approach to unsupervised Anomaly Detection, allowing for consistent refinement of models in industrial applications.
In the realm of data-intensive fields, obtaining accurate labels becomes increasingly challenging, particularly in Anomaly Detection, where anomalies are context-dependent and difficult to define. This thesis addresses these challenges by integrating Active Learning techniques with the Isolation Forest algorithm to refine unsupervised Anomaly Detection and align it closely with user-specific anomaly definitions. The research focuses on detecting anomalies in an unlabeled industrial dataset using Active Learning Isolation Forest (ALIF) and incorporating Bayesian inference techniques (B-ALIF) to enhance ALIF’s heuristic approaches. The primary goal is to investigate the impact of these active labeling techniques on anomaly scores and to compare their practicality and effectiveness in real-world industrial settings. We implement HDBSCAN and Isolation Forest to detect potential anomalies, followed by the introduction of Active Learning techniques to incorporate domain expert feedback. By utilizing UMAP visualizations and the AcME algorithm for local explainability, we analyze the evolution of anomaly scores and feature importance across updates. Importantly, we observe that each model identifies different types of anomalies, driven by the way they function, underscoring the need for expert feedback in Anomaly Detection. The results demonstrate that while ALIF aggressively adapts to labeled data, B-ALIF provides better adaptability and control through the offset parameter, ensuring that the prior model’s structure is not entirely discarded. When expert labels align with prior predictions, B-ALIF reinforces them; when they differ, B-ALIF adjusts the model to align with the expert's definition of anomalies without discarding prior knowledge. This research shows that Active Learning, particularly through B-ALIF, offers a more robust and balanced approach to unsupervised Anomaly Detection, allowing for consistent refinement of models in industrial applications.
Enhancing Anomaly Detection: Integrating Human Feedback Through Active Learning
BAZ RADWAN, FAIROUZ
2023/2024
Abstract
In the realm of data-intensive fields, obtaining accurate labels becomes increasingly challenging, particularly in Anomaly Detection, where anomalies are context-dependent and difficult to define. This thesis addresses these challenges by integrating Active Learning techniques with the Isolation Forest algorithm to refine unsupervised Anomaly Detection and align it closely with user-specific anomaly definitions. The research focuses on detecting anomalies in an unlabeled industrial dataset using Active Learning Isolation Forest (ALIF) and incorporating Bayesian inference techniques (B-ALIF) to enhance ALIF’s heuristic approaches. The primary goal is to investigate the impact of these active labeling techniques on anomaly scores and to compare their practicality and effectiveness in real-world industrial settings. We implement HDBSCAN and Isolation Forest to detect potential anomalies, followed by the introduction of Active Learning techniques to incorporate domain expert feedback. By utilizing UMAP visualizations and the AcME algorithm for local explainability, we analyze the evolution of anomaly scores and feature importance across updates. Importantly, we observe that each model identifies different types of anomalies, driven by the way they function, underscoring the need for expert feedback in Anomaly Detection. The results demonstrate that while ALIF aggressively adapts to labeled data, B-ALIF provides better adaptability and control through the offset parameter, ensuring that the prior model’s structure is not entirely discarded. When expert labels align with prior predictions, B-ALIF reinforces them; when they differ, B-ALIF adjusts the model to align with the expert's definition of anomalies without discarding prior knowledge. This research shows that Active Learning, particularly through B-ALIF, offers a more robust and balanced approach to unsupervised Anomaly Detection, allowing for consistent refinement of models in industrial applications.File | Dimensione | Formato | |
---|---|---|---|
Data_Science_MsC_Thesis_Fairouz_Baz_Radwan.pdf
accesso riservato
Dimensione
2.72 MB
Formato
Adobe PDF
|
2.72 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/71019