Anomaly Detection for Industry 5.0: A Comparative Study of Supervised, Unsupervised, and Multimodal Approaches in Human-Centered Turning Processes

This thesis investigates anomaly detection in industrial products, with a particular focus on the metal manufacturing sector. Using high resolution image data provided by Zannini S.p.A., we evaluate a wide spectrum of techniques, including supervised, unsupervised, and semi-supervised learning methods, alongside emerging approaches based on vision-language models (VLMs), zero-shot anomaly detection, and multimodal large language models (MLLMs). In particular, we implement Anomaly-OV, a recent framework for zero-shot anomaly detection and reasoning that supports both anomaly localization and natural language explanation. Our methodology combines nested K-Fold cross-validation with extensive data augmentation techniques such as gamma correction, rotation, zoom, and flipping to improve robustness and generalization. Model performance is systematically assessed using standard metrics including accuracy, AUC, and F1-score. Some approaches rely exclusively on image data, while others integrate image and text modalities, enabling interactive chatbot style querying of anomalies during inference. The experimental results indicate that supervised techniques consistently achieve the highest detection accuracy, but their effectiveness depends on access to large labeled datasets, which are costly and difficult to obtain in practice. Unsupervised approaches, relying only on normal samples, ease the labeling burden but generally produce lower precision. Zero-shot and MLLM-based methods can operate with minimal training data, offering adaptability and interpretability, albeit with somewhat lower accuracy compared to supervised methods We compare these approaches and highlight the trade-offs between accuracy, the amount of training data required, and explainability. While supervised methods remain the most effective for accuracy, MLLM-based techniques contribute valuable interpretability and interactive reasoning. Finally, we discuss practical implications for industrial quality control, where both reliability and transparency are critical.

Anomaly Detection for Industry 5.0: A Comparative Study of Supervised, Unsupervised, and Multimodal Approaches in Human-Centered Turning Processes

KARIMI, ARMIN

2024/2025

Abstract

This thesis investigates anomaly detection in industrial products, with a particular focus on the metal manufacturing sector. Using high resolution image data provided by Zannini S.p.A., we evaluate a wide spectrum of techniques, including supervised, unsupervised, and semi-supervised learning methods, alongside emerging approaches based on vision-language models (VLMs), zero-shot anomaly detection, and multimodal large language models (MLLMs). In particular, we implement Anomaly-OV, a recent framework for zero-shot anomaly detection and reasoning that supports both anomaly localization and natural language explanation. Our methodology combines nested K-Fold cross-validation with extensive data augmentation techniques such as gamma correction, rotation, zoom, and flipping to improve robustness and generalization. Model performance is systematically assessed using standard metrics including accuracy, AUC, and F1-score. Some approaches rely exclusively on image data, while others integrate image and text modalities, enabling interactive chatbot style querying of anomalies during inference. The experimental results indicate that supervised techniques consistently achieve the highest detection accuracy, but their effectiveness depends on access to large labeled datasets, which are costly and difficult to obtain in practice. Unsupervised approaches, relying only on normal samples, ease the labeling burden but generally produce lower precision. Zero-shot and MLLM-based methods can operate with minimal training data, offering adaptability and interpretability, albeit with somewhat lower accuracy compared to supervised methods We compare these approaches and highlight the trade-offs between accuracy, the amount of training data required, and explainability. While supervised methods remain the most effective for accuracy, MLLM-based techniques contribute valuable interpretability and interactive reasoning. Finally, we discuss practical implications for industrial quality control, where both reliability and transparency are critical.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Anomaly Detection for Industry 5.0: A Comparative Study of Supervised, Unsupervised, and Multimodal Approaches in Human-Centered Turning Processes
			
	Abstract in italiano
	
				This thesis investigates anomaly detection in industrial products, with a particular focus on the metal manufacturing sector. Using high resolution image data provided by Zannini S.p.A., we evaluate a wide spectrum of techniques, including supervised, unsupervised, and semi-supervised learning methods, alongside emerging approaches based on vision-language models (VLMs), zero-shot anomaly detection, and multimodal large language models (MLLMs). In particular, we implement Anomaly-OV, a recent framework for zero-shot anomaly detection and reasoning that supports both anomaly localization and natural language explanation. 

Our methodology combines nested K-Fold cross-validation with extensive data augmentation techniques such as gamma correction, rotation, zoom, and flipping to improve robustness and generalization. Model performance is systematically assessed using standard metrics including accuracy, AUC, and F1-score. Some approaches rely exclusively on image data, while others integrate image and text modalities, enabling interactive chatbot style querying of anomalies during inference. 

The experimental results indicate that supervised techniques consistently achieve the highest detection accuracy, but their effectiveness depends on access to large labeled datasets, which are costly and difficult to obtain in practice. Unsupervised approaches, relying only on normal samples, ease the labeling burden but generally produce lower precision. Zero-shot and MLLM-based methods can operate with minimal training data, offering adaptability and interpretability, albeit with somewhat lower accuracy compared to supervised methods 

We compare these approaches and highlight the trade-offs between accuracy, the amount of training data required, and explainability. While supervised methods remain the most effective for accuracy, MLLM-based techniques contribute valuable interpretability and interactive reasoning. Finally, we discuss practical implications for industrial quality control, where both reliability and transparency are critical.
			
	Parola chiave
	
				anomaly detection
zero-shot learning
supervised learning
			
	Relatore
	
				PRETTO, ALBERTO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Karimi_Armin.pdf Accesso riservato Dimensione 4.99 MB Formato Adobe PDF	4.99 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/98770