Impact of Nominal Data Contamination and Fault Data Scarcity on Fault Classification Performance

This thesis examines the impact of nominal data contamination and fault data scarcity on the performance of anomaly classification models used in industrial fault detection. These challenges are prevalent in real-world industrial environments, where nominal data is abundant but often contaminated with mislabeled fault points, and fault data is limited due to their rarity. Using the Glass Identification and Statlog datasets, the research investigates the performance of Random Forest, and the FLEX-C Label and FLEX-C Centroid models that we developed. The results show that while Random Forest struggles to learn effectively with contaminated and limited fault data, our FLEX-C models maintain high performance, outperforming Random Forest with only a small number of labeled fault samples. Moreover, our FLEX-C models integrate nominal data more effectively than a self-supervised approach. Our theoretical time complexity analysis demonstrates that these models are computationally efficient and perform comparably to Random Forest. This study provides valuable insights into improving industrial fault detection systems in real-world scenarios with imperfect data.

Impact of Nominal Data Contamination and Fault Data Scarcity on Fault Classification Performance

BAKIRCI, İCLEM NAZ

2024/2025

Abstract

This thesis examines the impact of nominal data contamination and fault data scarcity on the performance of anomaly classification models used in industrial fault detection. These challenges are prevalent in real-world industrial environments, where nominal data is abundant but often contaminated with mislabeled fault points, and fault data is limited due to their rarity. Using the Glass Identification and Statlog datasets, the research investigates the performance of Random Forest, and the FLEX-C Label and FLEX-C Centroid models that we developed. The results show that while Random Forest struggles to learn effectively with contaminated and limited fault data, our FLEX-C models maintain high performance, outperforming Random Forest with only a small number of labeled fault samples. Moreover, our FLEX-C models integrate nominal data more effectively than a self-supervised approach. Our theoretical time complexity analysis demonstrates that these models are computationally efficient and perform comparably to Random Forest. This study provides valuable insights into improving industrial fault detection systems in real-world scenarios with imperfect data.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				DATA SCIENCE  Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Impact of Nominal Data Contamination and Fault Data Scarcity on Fault Classification Performance
			
	Abstract in italiano
	
				This thesis examines the impact of nominal data contamination and fault data scarcity on the performance of anomaly classification models used in industrial fault detection. These challenges are prevalent in real-world industrial environments, where nominal data is abundant but often contaminated with mislabeled fault points, and fault data is limited due to their rarity. Using the Glass Identification and Statlog datasets, the research investigates the performance of Random Forest, and the FLEX-C Label and FLEX-C Centroid models that we developed. The results show that while Random Forest struggles to learn effectively with contaminated and limited fault data, our FLEX-C models maintain high performance, outperforming Random Forest with only a small number of labeled fault samples. Moreover, our FLEX-C models integrate nominal data more effectively than a self-supervised approach. Our theoretical time complexity analysis demonstrates that these models are computationally efficient and perform comparably to Random Forest. This study provides valuable insights into improving industrial fault detection systems in real-world scenarios with imperfect data.
			
	Parola chiave
	
				Fault Classification
Machine Learning
Ensemble Approaches
			
	Relatore
	
				SUSTO, GIAN ANTONIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
thesis_iclem_naz_bakirci.pdf Accesso riservato Dimensione 2.19 MB Formato Adobe PDF	2.19 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/89822