Isolation Forest and its extensions are widely used for unsupervised anomaly detection across many fields due to their computational efficiency and scalability. However, these methods typically flag anomalous instances without explaining the factors contributing to the anomaly, and this limits their practical adoption. To improve explainability for tabular datasets, two novel model-specific interpretation algorithms are introduced: KNNIFE (K-Nearest-Neighbor Isolation Forest Explanations) and soft KNNIFE. These methods leverage data-dependent information to provide both global and local feature importance scores for any variant of Isolation Forest. As a secondary contribution, this work proposes the Neural Network Isolation Forest (NNIF) algorithm, an Isolation Forest extension designed to serve as a testbed for the introduced interpretations. Experimental evaluations on both synthetic and real-world datasets, using the Area Under the Curve of Feature Selection and the correlation between Local Feature Importances and anomaly scores as metrics, demonstrate that KNNIFE and soft KNNIFE achieve performance comparable to or superior to baseline interpretation methods for the Isolation Forest and Extended Isolation Forest models. Furthermore, the proposed methods are also highly effective for interpreting NNIF, which in turn proves more successful than DIF in accurately identifying anomalous instances.
Isolation Forest and its extensions are widely used for unsupervised anomaly detection across many fields due to their computational efficiency and scalability. However, these methods typically flag anomalous instances without explaining the factors contributing to the anomaly, and this limits their practical adoption. To improve explainability for tabular datasets, two novel model-specific interpretation algorithms are introduced: KNNIFE (K-Nearest-Neighbor Isolation Forest Explanations) and soft KNNIFE. These methods leverage data-dependent information to provide both global and local feature importance scores for any variant of Isolation Forest. As a secondary contribution, this work proposes the Neural Network Isolation Forest (NNIF) algorithm, an Isolation Forest extension designed to serve as a testbed for the introduced interpretations. Experimental evaluations on both synthetic and real-world datasets, using the Area Under the Curve of Feature Selection and the correlation between Local Feature Importances and anomaly scores as metrics, demonstrate that KNNIFE and soft KNNIFE achieve performance comparable to or superior to baseline interpretation methods for the Isolation Forest and Extended Isolation Forest models. Furthermore, the proposed methods are also highly effective for interpreting NNIF, which in turn proves more successful than DIF in accurately identifying anomalous instances.
KNNIFE: a data-informed Feature Importance for Isolation Forest and its extensions
DE VIDI, RICCARDO
2025/2026
Abstract
Isolation Forest and its extensions are widely used for unsupervised anomaly detection across many fields due to their computational efficiency and scalability. However, these methods typically flag anomalous instances without explaining the factors contributing to the anomaly, and this limits their practical adoption. To improve explainability for tabular datasets, two novel model-specific interpretation algorithms are introduced: KNNIFE (K-Nearest-Neighbor Isolation Forest Explanations) and soft KNNIFE. These methods leverage data-dependent information to provide both global and local feature importance scores for any variant of Isolation Forest. As a secondary contribution, this work proposes the Neural Network Isolation Forest (NNIF) algorithm, an Isolation Forest extension designed to serve as a testbed for the introduced interpretations. Experimental evaluations on both synthetic and real-world datasets, using the Area Under the Curve of Feature Selection and the correlation between Local Feature Importances and anomaly scores as metrics, demonstrate that KNNIFE and soft KNNIFE achieve performance comparable to or superior to baseline interpretation methods for the Isolation Forest and Extended Isolation Forest models. Furthermore, the proposed methods are also highly effective for interpreting NNIF, which in turn proves more successful than DIF in accurately identifying anomalous instances.| File | Dimensione | Formato | |
|---|---|---|---|
|
DeVidi_Riccardo.pdf
Accesso riservato
Dimensione
7.37 MB
Formato
Adobe PDF
|
7.37 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/106017