This thesis examines the impact of nominal data contamination and fault data scarcity on the performance of anomaly classification models used in industrial fault detection. These challenges are prevalent in real-world industrial environments, where nominal data is abundant but often contaminated with mislabeled fault points, and fault data is limited due to their rarity. Using the Glass Identification and Statlog datasets, the research investigates the performance of Random Forest, and the FLEX-C Label and FLEX-C Centroid models that we developed. The results show that while Random Forest struggles to learn effectively with contaminated and limited fault data, our FLEX-C models maintain high performance, outperforming Random Forest with only a small number of labeled fault samples. Moreover, our FLEX-C models integrate nominal data more effectively than a self-supervised approach. Our theoretical time complexity analysis demonstrates that these models are computationally efficient and perform comparably to Random Forest. This study provides valuable insights into improving industrial fault detection systems in real-world scenarios with imperfect data.

This thesis examines the impact of nominal data contamination and fault data scarcity on the performance of anomaly classification models used in industrial fault detection. These challenges are prevalent in real-world industrial environments, where nominal data is abundant but often contaminated with mislabeled fault points, and fault data is limited due to their rarity. Using the Glass Identification and Statlog datasets, the research investigates the performance of Random Forest, and the FLEX-C Label and FLEX-C Centroid models that we developed. The results show that while Random Forest struggles to learn effectively with contaminated and limited fault data, our FLEX-C models maintain high performance, outperforming Random Forest with only a small number of labeled fault samples. Moreover, our FLEX-C models integrate nominal data more effectively than a self-supervised approach. Our theoretical time complexity analysis demonstrates that these models are computationally efficient and perform comparably to Random Forest. This study provides valuable insights into improving industrial fault detection systems in real-world scenarios with imperfect data.

Impact of Nominal Data Contamination and Fault Data Scarcity on Fault Classification Performance

BAKIRCI, İCLEM NAZ
2024/2025

Abstract

This thesis examines the impact of nominal data contamination and fault data scarcity on the performance of anomaly classification models used in industrial fault detection. These challenges are prevalent in real-world industrial environments, where nominal data is abundant but often contaminated with mislabeled fault points, and fault data is limited due to their rarity. Using the Glass Identification and Statlog datasets, the research investigates the performance of Random Forest, and the FLEX-C Label and FLEX-C Centroid models that we developed. The results show that while Random Forest struggles to learn effectively with contaminated and limited fault data, our FLEX-C models maintain high performance, outperforming Random Forest with only a small number of labeled fault samples. Moreover, our FLEX-C models integrate nominal data more effectively than a self-supervised approach. Our theoretical time complexity analysis demonstrates that these models are computationally efficient and perform comparably to Random Forest. This study provides valuable insights into improving industrial fault detection systems in real-world scenarios with imperfect data.
2024
Impact of Nominal Data Contamination and Fault Data Scarcity on Fault Classification Performance
This thesis examines the impact of nominal data contamination and fault data scarcity on the performance of anomaly classification models used in industrial fault detection. These challenges are prevalent in real-world industrial environments, where nominal data is abundant but often contaminated with mislabeled fault points, and fault data is limited due to their rarity. Using the Glass Identification and Statlog datasets, the research investigates the performance of Random Forest, and the FLEX-C Label and FLEX-C Centroid models that we developed. The results show that while Random Forest struggles to learn effectively with contaminated and limited fault data, our FLEX-C models maintain high performance, outperforming Random Forest with only a small number of labeled fault samples. Moreover, our FLEX-C models integrate nominal data more effectively than a self-supervised approach. Our theoretical time complexity analysis demonstrates that these models are computationally efficient and perform comparably to Random Forest. This study provides valuable insights into improving industrial fault detection systems in real-world scenarios with imperfect data.
Fault Classification
Machine Learning
Ensemble Approaches
File in questo prodotto:
File Dimensione Formato  
thesis_iclem_naz_bakirci.pdf

Accesso riservato

Dimensione 2.19 MB
Formato Adobe PDF
2.19 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/89822