This thesis examines the impact of nominal data contamination and fault data scarcity on the performance of anomaly classification models used in industrial fault detection. These challenges are prevalent in real-world industrial environments, where nominal data is abundant but often contaminated with mislabeled fault points, and fault data is limited due to their rarity. Using the Glass Identification and Statlog datasets, the research investigates the performance of Random Forest, and the FLEX-C Label and FLEX-C Centroid models that we developed. The results show that while Random Forest struggles to learn effectively with contaminated and limited fault data, our FLEX-C models maintain high performance, outperforming Random Forest with only a small number of labeled fault samples. Moreover, our FLEX-C models integrate nominal data more effectively than a self-supervised approach. Our theoretical time complexity analysis demonstrates that these models are computationally efficient and perform comparably to Random Forest. This study provides valuable insights into improving industrial fault detection systems in real-world scenarios with imperfect data.
This thesis examines the impact of nominal data contamination and fault data scarcity on the performance of anomaly classification models used in industrial fault detection. These challenges are prevalent in real-world industrial environments, where nominal data is abundant but often contaminated with mislabeled fault points, and fault data is limited due to their rarity. Using the Glass Identification and Statlog datasets, the research investigates the performance of Random Forest, and the FLEX-C Label and FLEX-C Centroid models that we developed. The results show that while Random Forest struggles to learn effectively with contaminated and limited fault data, our FLEX-C models maintain high performance, outperforming Random Forest with only a small number of labeled fault samples. Moreover, our FLEX-C models integrate nominal data more effectively than a self-supervised approach. Our theoretical time complexity analysis demonstrates that these models are computationally efficient and perform comparably to Random Forest. This study provides valuable insights into improving industrial fault detection systems in real-world scenarios with imperfect data.
Impact of Nominal Data Contamination and Fault Data Scarcity on Fault Classification Performance
BAKIRCI, İCLEM NAZ
2024/2025
Abstract
This thesis examines the impact of nominal data contamination and fault data scarcity on the performance of anomaly classification models used in industrial fault detection. These challenges are prevalent in real-world industrial environments, where nominal data is abundant but often contaminated with mislabeled fault points, and fault data is limited due to their rarity. Using the Glass Identification and Statlog datasets, the research investigates the performance of Random Forest, and the FLEX-C Label and FLEX-C Centroid models that we developed. The results show that while Random Forest struggles to learn effectively with contaminated and limited fault data, our FLEX-C models maintain high performance, outperforming Random Forest with only a small number of labeled fault samples. Moreover, our FLEX-C models integrate nominal data more effectively than a self-supervised approach. Our theoretical time complexity analysis demonstrates that these models are computationally efficient and perform comparably to Random Forest. This study provides valuable insights into improving industrial fault detection systems in real-world scenarios with imperfect data.| File | Dimensione | Formato | |
|---|---|---|---|
|
thesis_iclem_naz_bakirci.pdf
Accesso riservato
Dimensione
2.19 MB
Formato
Adobe PDF
|
2.19 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/89822