In this thesis, we study the relationship between the notion of outlier employed by Isolation Forest and the 3-approximation algorithm for solving the k-center with z outliers problem. Both algorithms’ strategy is influenced by the concept of density, which motivates our comparison. We also design a new method employing Isolation Forest as a preprocessing step for efficiently solving the k-center with z outliers problem. Through our experimental analysis we find that, depending on outlier type, these methods do not always return similar sets of outliers but nonetheless the returned outliers are of comparable outlying degree. Furthermore, the proposed method shows substantial efficiency gains, with a nearly linear complexity as opposed to the more than quadratic complexity of the classical 3-approximation algorithm.

In this thesis, we study the relationship between the notion of outlier employed by Isolation Forest and the 3-approximation algorithm for solving the k-center with z outliers problem. Both algorithms’ strategy is influenced by the concept of density, which motivates our comparison. We also design a new method employing Isolation Forest as a preprocessing step for efficiently solving the k-center with z outliers problem. Through our experimental analysis we find that, depending on outlier type, these methods do not always return similar sets of outliers but nonetheless the returned outliers are of comparable outlying degree. Furthermore, the proposed method shows substantial efficiency gains, with a nearly linear complexity as opposed to the more than quadratic complexity of the classical 3-approximation algorithm.

Comparison of Isolation Forest and Clustering Methods for Outlier Detection

BEJAJ, XHACU
2024/2025

Abstract

In this thesis, we study the relationship between the notion of outlier employed by Isolation Forest and the 3-approximation algorithm for solving the k-center with z outliers problem. Both algorithms’ strategy is influenced by the concept of density, which motivates our comparison. We also design a new method employing Isolation Forest as a preprocessing step for efficiently solving the k-center with z outliers problem. Through our experimental analysis we find that, depending on outlier type, these methods do not always return similar sets of outliers but nonetheless the returned outliers are of comparable outlying degree. Furthermore, the proposed method shows substantial efficiency gains, with a nearly linear complexity as opposed to the more than quadratic complexity of the classical 3-approximation algorithm.
2024
Comparison of Isolation Forest and Clustering Methods for Outlier Detection
In this thesis, we study the relationship between the notion of outlier employed by Isolation Forest and the 3-approximation algorithm for solving the k-center with z outliers problem. Both algorithms’ strategy is influenced by the concept of density, which motivates our comparison. We also design a new method employing Isolation Forest as a preprocessing step for efficiently solving the k-center with z outliers problem. Through our experimental analysis we find that, depending on outlier type, these methods do not always return similar sets of outliers but nonetheless the returned outliers are of comparable outlying degree. Furthermore, the proposed method shows substantial efficiency gains, with a nearly linear complexity as opposed to the more than quadratic complexity of the classical 3-approximation algorithm.
outliers
isolation forest
k-center
File in questo prodotto:
File Dimensione Formato  
Bejaj_Xhacu.pdf

accesso aperto

Dimensione 761.42 kB
Formato Adobe PDF
761.42 kB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/86927