KNNIFE: a data-informed Feature Importance for Isolation Forest and its extensions

Isolation Forest and its extensions are widely used for unsupervised anomaly detection across many fields due to their computational efficiency and scalability. However, these methods typically flag anomalous instances without explaining the factors contributing to the anomaly, and this limits their practical adoption. To improve explainability for tabular datasets, two novel model-specific interpretation algorithms are introduced: KNNIFE (K-Nearest-Neighbor Isolation Forest Explanations) and soft KNNIFE. These methods leverage data-dependent information to provide both global and local feature importance scores for any variant of Isolation Forest. As a secondary contribution, this work proposes the Neural Network Isolation Forest (NNIF) algorithm, an Isolation Forest extension designed to serve as a testbed for the introduced interpretations. Experimental evaluations on both synthetic and real-world datasets, using the Area Under the Curve of Feature Selection and the correlation between Local Feature Importances and anomaly scores as metrics, demonstrate that KNNIFE and soft KNNIFE achieve performance comparable to or superior to baseline interpretation methods for the Isolation Forest and Extended Isolation Forest models. Furthermore, the proposed methods are also highly effective for interpreting NNIF, which in turn proves more successful than DIF in accurately identifying anomalous instances.

KNNIFE: a data-informed Feature Importance for Isolation Forest and its extensions

DE VIDI, RICCARDO

2025/2026

Abstract

Isolation Forest and its extensions are widely used for unsupervised anomaly detection across many fields due to their computational efficiency and scalability. However, these methods typically flag anomalous instances without explaining the factors contributing to the anomaly, and this limits their practical adoption. To improve explainability for tabular datasets, two novel model-specific interpretation algorithms are introduced: KNNIFE (K-Nearest-Neighbor Isolation Forest Explanations) and soft KNNIFE. These methods leverage data-dependent information to provide both global and local feature importance scores for any variant of Isolation Forest. As a secondary contribution, this work proposes the Neural Network Isolation Forest (NNIF) algorithm, an Isolation Forest extension designed to serve as a testbed for the introduced interpretations. Experimental evaluations on both synthetic and real-world datasets, using the Area Under the Curve of Feature Selection and the correlation between Local Feature Importances and anomaly scores as metrics, demonstrate that KNNIFE and soft KNNIFE achieve performance comparable to or superior to baseline interpretation methods for the Isolation Forest and Extended Isolation Forest models. Furthermore, the proposed methods are also highly effective for interpreting NNIF, which in turn proves more successful than DIF in accurately identifying anomalous instances.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				CONTROL SYSTEMS ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2025
			
	Titolo inglese
	
				KNNIFE: a data-informed Feature Importance for Isolation Forest and its extensions
			
	Abstract in italiano
	
				Isolation Forest and its extensions are widely used for unsupervised anomaly detection across many fields due to their computational efficiency and scalability. However, these methods typically flag anomalous instances without explaining the factors contributing to the anomaly, and this limits their practical adoption.

To improve explainability for tabular datasets, two novel model-specific interpretation algorithms are introduced: KNNIFE (K-Nearest-Neighbor Isolation Forest Explanations) and soft KNNIFE. These methods leverage data-dependent information to provide both global and local feature importance scores for any variant of Isolation Forest.

As a secondary contribution, this work proposes the Neural Network Isolation Forest (NNIF) algorithm, an Isolation Forest extension designed to serve as a testbed for the introduced interpretations.

Experimental evaluations on both synthetic and real-world datasets, using the Area Under the Curve of Feature Selection and the correlation between Local Feature Importances and anomaly scores as metrics, demonstrate that KNNIFE and soft KNNIFE achieve performance comparable to or superior to baseline interpretation methods for the Isolation Forest and Extended Isolation Forest models. Furthermore, the proposed methods are also highly effective for interpreting NNIF, which in turn proves more successful than DIF in accurately identifying anomalous instances.
			
	Parola chiave
	
				Machine Learning
Anomaly Detection
Interpretability
			
	Relatore
	
				SUSTO, GIAN ANTONIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
DeVidi_Riccardo.pdf Accesso riservato Dimensione 7.37 MB Formato Adobe PDF	7.37 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/106017