In the realm of data-intensive fields, obtaining accurate labels becomes increasingly challenging, particularly in Anomaly Detection, where anomalies are context-dependent and difficult to define. This thesis addresses these challenges by integrating Active Learning techniques with the Isolation Forest algorithm to refine unsupervised Anomaly Detection and align it closely with user-specific anomaly definitions. The research focuses on detecting anomalies in an unlabeled industrial dataset using Active Learning Isolation Forest (ALIF) and incorporating Bayesian inference techniques (B-ALIF) to enhance ALIF’s heuristic approaches. The primary goal is to investigate the impact of these active labeling techniques on anomaly scores and to compare their practicality and effectiveness in real-world industrial settings. We implement HDBSCAN and Isolation Forest to detect potential anomalies, followed by the introduction of Active Learning techniques to incorporate domain expert feedback. By utilizing UMAP visualizations and the AcME algorithm for local explainability, we analyze the evolution of anomaly scores and feature importance across updates. Importantly, we observe that each model identifies different types of anomalies, driven by the way they function, underscoring the need for expert feedback in Anomaly Detection. The results demonstrate that while ALIF aggressively adapts to labeled data, B-ALIF provides better adaptability and control through the offset parameter, ensuring that the prior model’s structure is not entirely discarded. When expert labels align with prior predictions, B-ALIF reinforces them; when they differ, B-ALIF adjusts the model to align with the expert's definition of anomalies without discarding prior knowledge. This research shows that Active Learning, particularly through B-ALIF, offers a more robust and balanced approach to unsupervised Anomaly Detection, allowing for consistent refinement of models in industrial applications.

In the realm of data-intensive fields, obtaining accurate labels becomes increasingly challenging, particularly in Anomaly Detection, where anomalies are context-dependent and difficult to define. This thesis addresses these challenges by integrating Active Learning techniques with the Isolation Forest algorithm to refine unsupervised Anomaly Detection and align it closely with user-specific anomaly definitions. The research focuses on detecting anomalies in an unlabeled industrial dataset using Active Learning Isolation Forest (ALIF) and incorporating Bayesian inference techniques (B-ALIF) to enhance ALIF’s heuristic approaches. The primary goal is to investigate the impact of these active labeling techniques on anomaly scores and to compare their practicality and effectiveness in real-world industrial settings. We implement HDBSCAN and Isolation Forest to detect potential anomalies, followed by the introduction of Active Learning techniques to incorporate domain expert feedback. By utilizing UMAP visualizations and the AcME algorithm for local explainability, we analyze the evolution of anomaly scores and feature importance across updates. Importantly, we observe that each model identifies different types of anomalies, driven by the way they function, underscoring the need for expert feedback in Anomaly Detection. The results demonstrate that while ALIF aggressively adapts to labeled data, B-ALIF provides better adaptability and control through the offset parameter, ensuring that the prior model’s structure is not entirely discarded. When expert labels align with prior predictions, B-ALIF reinforces them; when they differ, B-ALIF adjusts the model to align with the expert's definition of anomalies without discarding prior knowledge. This research shows that Active Learning, particularly through B-ALIF, offers a more robust and balanced approach to unsupervised Anomaly Detection, allowing for consistent refinement of models in industrial applications.

Enhancing Anomaly Detection: Integrating Human Feedback Through Active Learning

BAZ RADWAN, FAIROUZ
2023/2024

Abstract

In the realm of data-intensive fields, obtaining accurate labels becomes increasingly challenging, particularly in Anomaly Detection, where anomalies are context-dependent and difficult to define. This thesis addresses these challenges by integrating Active Learning techniques with the Isolation Forest algorithm to refine unsupervised Anomaly Detection and align it closely with user-specific anomaly definitions. The research focuses on detecting anomalies in an unlabeled industrial dataset using Active Learning Isolation Forest (ALIF) and incorporating Bayesian inference techniques (B-ALIF) to enhance ALIF’s heuristic approaches. The primary goal is to investigate the impact of these active labeling techniques on anomaly scores and to compare their practicality and effectiveness in real-world industrial settings. We implement HDBSCAN and Isolation Forest to detect potential anomalies, followed by the introduction of Active Learning techniques to incorporate domain expert feedback. By utilizing UMAP visualizations and the AcME algorithm for local explainability, we analyze the evolution of anomaly scores and feature importance across updates. Importantly, we observe that each model identifies different types of anomalies, driven by the way they function, underscoring the need for expert feedback in Anomaly Detection. The results demonstrate that while ALIF aggressively adapts to labeled data, B-ALIF provides better adaptability and control through the offset parameter, ensuring that the prior model’s structure is not entirely discarded. When expert labels align with prior predictions, B-ALIF reinforces them; when they differ, B-ALIF adjusts the model to align with the expert's definition of anomalies without discarding prior knowledge. This research shows that Active Learning, particularly through B-ALIF, offers a more robust and balanced approach to unsupervised Anomaly Detection, allowing for consistent refinement of models in industrial applications.
2023
Enhancing Anomaly Detection: Integrating Human Feedback Through Active Learning
In the realm of data-intensive fields, obtaining accurate labels becomes increasingly challenging, particularly in Anomaly Detection, where anomalies are context-dependent and difficult to define. This thesis addresses these challenges by integrating Active Learning techniques with the Isolation Forest algorithm to refine unsupervised Anomaly Detection and align it closely with user-specific anomaly definitions. The research focuses on detecting anomalies in an unlabeled industrial dataset using Active Learning Isolation Forest (ALIF) and incorporating Bayesian inference techniques (B-ALIF) to enhance ALIF’s heuristic approaches. The primary goal is to investigate the impact of these active labeling techniques on anomaly scores and to compare their practicality and effectiveness in real-world industrial settings. We implement HDBSCAN and Isolation Forest to detect potential anomalies, followed by the introduction of Active Learning techniques to incorporate domain expert feedback. By utilizing UMAP visualizations and the AcME algorithm for local explainability, we analyze the evolution of anomaly scores and feature importance across updates. Importantly, we observe that each model identifies different types of anomalies, driven by the way they function, underscoring the need for expert feedback in Anomaly Detection. The results demonstrate that while ALIF aggressively adapts to labeled data, B-ALIF provides better adaptability and control through the offset parameter, ensuring that the prior model’s structure is not entirely discarded. When expert labels align with prior predictions, B-ALIF reinforces them; when they differ, B-ALIF adjusts the model to align with the expert's definition of anomalies without discarding prior knowledge. This research shows that Active Learning, particularly through B-ALIF, offers a more robust and balanced approach to unsupervised Anomaly Detection, allowing for consistent refinement of models in industrial applications.
Anomaly Detection
Active Learning
Machine Lerning
File in questo prodotto:
File Dimensione Formato  
Data_Science_MsC_Thesis_Fairouz_Baz_Radwan.pdf

accesso riservato

Dimensione 2.72 MB
Formato Adobe PDF
2.72 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/71019