Enhancing Anomaly Detection: Integrating Human Feedback Through Active Learning

In the realm of data-intensive fields, obtaining accurate labels becomes increasingly challenging, particularly in Anomaly Detection, where anomalies are context-dependent and difficult to define. This thesis addresses these challenges by integrating Active Learning techniques with the Isolation Forest algorithm to refine unsupervised Anomaly Detection and align it closely with user-specific anomaly definitions. The research focuses on detecting anomalies in an unlabeled industrial dataset using Active Learning Isolation Forest (ALIF) and incorporating Bayesian inference techniques (B-ALIF) to enhance ALIF’s heuristic approaches. The primary goal is to investigate the impact of these active labeling techniques on anomaly scores and to compare their practicality and effectiveness in real-world industrial settings. We implement HDBSCAN and Isolation Forest to detect potential anomalies, followed by the introduction of Active Learning techniques to incorporate domain expert feedback. By utilizing UMAP visualizations and the AcME algorithm for local explainability, we analyze the evolution of anomaly scores and feature importance across updates. Importantly, we observe that each model identifies different types of anomalies, driven by the way they function, underscoring the need for expert feedback in Anomaly Detection. The results demonstrate that while ALIF aggressively adapts to labeled data, B-ALIF provides better adaptability and control through the offset parameter, ensuring that the prior model’s structure is not entirely discarded. When expert labels align with prior predictions, B-ALIF reinforces them; when they differ, B-ALIF adjusts the model to align with the expert's definition of anomalies without discarding prior knowledge. This research shows that Active Learning, particularly through B-ALIF, offers a more robust and balanced approach to unsupervised Anomaly Detection, allowing for consistent refinement of models in industrial applications.

Enhancing Anomaly Detection: Integrating Human Feedback Through Active Learning

BAZ RADWAN, FAIROUZ

2023/2024

Abstract

In the realm of data-intensive fields, obtaining accurate labels becomes increasingly challenging, particularly in Anomaly Detection, where anomalies are context-dependent and difficult to define. This thesis addresses these challenges by integrating Active Learning techniques with the Isolation Forest algorithm to refine unsupervised Anomaly Detection and align it closely with user-specific anomaly definitions. The research focuses on detecting anomalies in an unlabeled industrial dataset using Active Learning Isolation Forest (ALIF) and incorporating Bayesian inference techniques (B-ALIF) to enhance ALIF’s heuristic approaches. The primary goal is to investigate the impact of these active labeling techniques on anomaly scores and to compare their practicality and effectiveness in real-world industrial settings. We implement HDBSCAN and Isolation Forest to detect potential anomalies, followed by the introduction of Active Learning techniques to incorporate domain expert feedback. By utilizing UMAP visualizations and the AcME algorithm for local explainability, we analyze the evolution of anomaly scores and feature importance across updates. Importantly, we observe that each model identifies different types of anomalies, driven by the way they function, underscoring the need for expert feedback in Anomaly Detection. The results demonstrate that while ALIF aggressively adapts to labeled data, B-ALIF provides better adaptability and control through the offset parameter, ensuring that the prior model’s structure is not entirely discarded. When expert labels align with prior predictions, B-ALIF reinforces them; when they differ, B-ALIF adjusts the model to align with the expert's definition of anomalies without discarding prior knowledge. This research shows that Active Learning, particularly through B-ALIF, offers a more robust and balanced approach to unsupervised Anomaly Detection, allowing for consistent refinement of models in industrial applications.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				DATA SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Enhancing Anomaly Detection: Integrating Human Feedback Through Active Learning
			
	Abstract in italiano
	
				In the realm of data-intensive fields, obtaining accurate labels becomes increasingly challenging, particularly in Anomaly Detection, where anomalies are context-dependent and difficult to define. This thesis addresses these challenges by integrating Active Learning techniques with the Isolation Forest algorithm to refine unsupervised Anomaly Detection and align it closely with user-specific anomaly definitions. The research focuses on detecting anomalies in an unlabeled industrial dataset using Active Learning Isolation Forest (ALIF) and incorporating Bayesian inference techniques (B-ALIF) to enhance ALIF’s heuristic approaches. The primary goal is to investigate the impact of these active labeling techniques on anomaly scores and to compare their practicality and effectiveness in real-world industrial settings.

We implement HDBSCAN and Isolation Forest to detect potential anomalies, followed by the introduction of Active Learning techniques to incorporate domain expert feedback. By utilizing UMAP visualizations and the AcME algorithm for local explainability, we analyze the evolution of anomaly scores and feature importance across updates. Importantly, we observe that each model identifies different types of anomalies, driven by the way they function, underscoring the need for expert feedback in Anomaly Detection.

The results demonstrate that while ALIF aggressively adapts to labeled data, B-ALIF provides better adaptability and control through the offset parameter, ensuring that the prior model’s structure is not entirely discarded. When expert labels align with prior predictions, B-ALIF reinforces them; when they differ, B-ALIF adjusts the model to align with the expert's definition of anomalies without discarding prior knowledge. This research shows that Active Learning, particularly through B-ALIF, offers a more robust and balanced approach to unsupervised Anomaly Detection, allowing for consistent refinement of models in industrial applications.
			
	Parola chiave
	
				Anomaly Detection
Active Learning
Machine Lerning
			
	Relatore
	
				SUSTO, GIAN ANTONIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Data_Science_MsC_Thesis_Fairouz_Baz_Radwan.pdf accesso riservato Dimensione 2.72 MB Formato Adobe PDF	2.72 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/71019