Isolation Forest and its extensions are widely used for unsupervised anomaly detection across many fields due to their computational efficiency and scalability. However, these methods typically flag anomalous instances without explaining the factors contributing to the anomaly, and this limits their practical adoption. To improve explainability for tabular datasets, two novel model-specific interpretation algorithms are introduced: KNNIFE (K-Nearest-Neighbor Isolation Forest Explanations) and soft KNNIFE. These methods leverage data-dependent information to provide both global and local feature importance scores for any variant of Isolation Forest. As a secondary contribution, this work proposes the Neural Network Isolation Forest (NNIF) algorithm, an Isolation Forest extension designed to serve as a testbed for the introduced interpretations. Experimental evaluations on both synthetic and real-world datasets, using the Area Under the Curve of Feature Selection and the correlation between Local Feature Importances and anomaly scores as metrics, demonstrate that KNNIFE and soft KNNIFE achieve performance comparable to or superior to baseline interpretation methods for the Isolation Forest and Extended Isolation Forest models. Furthermore, the proposed methods are also highly effective for interpreting NNIF, which in turn proves more successful than DIF in accurately identifying anomalous instances.

Isolation Forest and its extensions are widely used for unsupervised anomaly detection across many fields due to their computational efficiency and scalability. However, these methods typically flag anomalous instances without explaining the factors contributing to the anomaly, and this limits their practical adoption. To improve explainability for tabular datasets, two novel model-specific interpretation algorithms are introduced: KNNIFE (K-Nearest-Neighbor Isolation Forest Explanations) and soft KNNIFE. These methods leverage data-dependent information to provide both global and local feature importance scores for any variant of Isolation Forest. As a secondary contribution, this work proposes the Neural Network Isolation Forest (NNIF) algorithm, an Isolation Forest extension designed to serve as a testbed for the introduced interpretations. Experimental evaluations on both synthetic and real-world datasets, using the Area Under the Curve of Feature Selection and the correlation between Local Feature Importances and anomaly scores as metrics, demonstrate that KNNIFE and soft KNNIFE achieve performance comparable to or superior to baseline interpretation methods for the Isolation Forest and Extended Isolation Forest models. Furthermore, the proposed methods are also highly effective for interpreting NNIF, which in turn proves more successful than DIF in accurately identifying anomalous instances.

KNNIFE: a data-informed Feature Importance for Isolation Forest and its extensions

DE VIDI, RICCARDO
2025/2026

Abstract

Isolation Forest and its extensions are widely used for unsupervised anomaly detection across many fields due to their computational efficiency and scalability. However, these methods typically flag anomalous instances without explaining the factors contributing to the anomaly, and this limits their practical adoption. To improve explainability for tabular datasets, two novel model-specific interpretation algorithms are introduced: KNNIFE (K-Nearest-Neighbor Isolation Forest Explanations) and soft KNNIFE. These methods leverage data-dependent information to provide both global and local feature importance scores for any variant of Isolation Forest. As a secondary contribution, this work proposes the Neural Network Isolation Forest (NNIF) algorithm, an Isolation Forest extension designed to serve as a testbed for the introduced interpretations. Experimental evaluations on both synthetic and real-world datasets, using the Area Under the Curve of Feature Selection and the correlation between Local Feature Importances and anomaly scores as metrics, demonstrate that KNNIFE and soft KNNIFE achieve performance comparable to or superior to baseline interpretation methods for the Isolation Forest and Extended Isolation Forest models. Furthermore, the proposed methods are also highly effective for interpreting NNIF, which in turn proves more successful than DIF in accurately identifying anomalous instances.
2025
KNNIFE: a data-informed Feature Importance for Isolation Forest and its extensions
Isolation Forest and its extensions are widely used for unsupervised anomaly detection across many fields due to their computational efficiency and scalability. However, these methods typically flag anomalous instances without explaining the factors contributing to the anomaly, and this limits their practical adoption. To improve explainability for tabular datasets, two novel model-specific interpretation algorithms are introduced: KNNIFE (K-Nearest-Neighbor Isolation Forest Explanations) and soft KNNIFE. These methods leverage data-dependent information to provide both global and local feature importance scores for any variant of Isolation Forest. As a secondary contribution, this work proposes the Neural Network Isolation Forest (NNIF) algorithm, an Isolation Forest extension designed to serve as a testbed for the introduced interpretations. Experimental evaluations on both synthetic and real-world datasets, using the Area Under the Curve of Feature Selection and the correlation between Local Feature Importances and anomaly scores as metrics, demonstrate that KNNIFE and soft KNNIFE achieve performance comparable to or superior to baseline interpretation methods for the Isolation Forest and Extended Isolation Forest models. Furthermore, the proposed methods are also highly effective for interpreting NNIF, which in turn proves more successful than DIF in accurately identifying anomalous instances.
Machine Learning
Anomaly Detection
Interpretability
File in questo prodotto:
File Dimensione Formato  
DeVidi_Riccardo.pdf

Accesso riservato

Dimensione 7.37 MB
Formato Adobe PDF
7.37 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/106017