This thesis investigates anomaly detection in industrial products, with a particular focus on the metal manufacturing sector. Using high resolution image data provided by Zannini S.p.A., we evaluate a wide spectrum of techniques, including supervised, unsupervised, and semi-supervised learning methods, alongside emerging approaches based on vision-language models (VLMs), zero-shot anomaly detection, and multimodal large language models (MLLMs). In particular, we implement Anomaly-OV, a recent framework for zero-shot anomaly detection and reasoning that supports both anomaly localization and natural language explanation. Our methodology combines nested K-Fold cross-validation with extensive data augmentation techniques such as gamma correction, rotation, zoom, and flipping to improve robustness and generalization. Model performance is systematically assessed using standard metrics including accuracy, AUC, and F1-score. Some approaches rely exclusively on image data, while others integrate image and text modalities, enabling interactive chatbot style querying of anomalies during inference. The experimental results indicate that supervised techniques consistently achieve the highest detection accuracy, but their effectiveness depends on access to large labeled datasets, which are costly and difficult to obtain in practice. Unsupervised approaches, relying only on normal samples, ease the labeling burden but generally produce lower precision. Zero-shot and MLLM-based methods can operate with minimal training data, offering adaptability and interpretability, albeit with somewhat lower accuracy compared to supervised methods We compare these approaches and highlight the trade-offs between accuracy, the amount of training data required, and explainability. While supervised methods remain the most effective for accuracy, MLLM-based techniques contribute valuable interpretability and interactive reasoning. Finally, we discuss practical implications for industrial quality control, where both reliability and transparency are critical.
This thesis investigates anomaly detection in industrial products, with a particular focus on the metal manufacturing sector. Using high resolution image data provided by Zannini S.p.A., we evaluate a wide spectrum of techniques, including supervised, unsupervised, and semi-supervised learning methods, alongside emerging approaches based on vision-language models (VLMs), zero-shot anomaly detection, and multimodal large language models (MLLMs). In particular, we implement Anomaly-OV, a recent framework for zero-shot anomaly detection and reasoning that supports both anomaly localization and natural language explanation. Our methodology combines nested K-Fold cross-validation with extensive data augmentation techniques such as gamma correction, rotation, zoom, and flipping to improve robustness and generalization. Model performance is systematically assessed using standard metrics including accuracy, AUC, and F1-score. Some approaches rely exclusively on image data, while others integrate image and text modalities, enabling interactive chatbot style querying of anomalies during inference. The experimental results indicate that supervised techniques consistently achieve the highest detection accuracy, but their effectiveness depends on access to large labeled datasets, which are costly and difficult to obtain in practice. Unsupervised approaches, relying only on normal samples, ease the labeling burden but generally produce lower precision. Zero-shot and MLLM-based methods can operate with minimal training data, offering adaptability and interpretability, albeit with somewhat lower accuracy compared to supervised methods We compare these approaches and highlight the trade-offs between accuracy, the amount of training data required, and explainability. While supervised methods remain the most effective for accuracy, MLLM-based techniques contribute valuable interpretability and interactive reasoning. Finally, we discuss practical implications for industrial quality control, where both reliability and transparency are critical.
Anomaly Detection for Industry 5.0: A Comparative Study of Supervised, Unsupervised, and Multimodal Approaches in Human-Centered Turning Processes
KARIMI, ARMIN
2024/2025
Abstract
This thesis investigates anomaly detection in industrial products, with a particular focus on the metal manufacturing sector. Using high resolution image data provided by Zannini S.p.A., we evaluate a wide spectrum of techniques, including supervised, unsupervised, and semi-supervised learning methods, alongside emerging approaches based on vision-language models (VLMs), zero-shot anomaly detection, and multimodal large language models (MLLMs). In particular, we implement Anomaly-OV, a recent framework for zero-shot anomaly detection and reasoning that supports both anomaly localization and natural language explanation. Our methodology combines nested K-Fold cross-validation with extensive data augmentation techniques such as gamma correction, rotation, zoom, and flipping to improve robustness and generalization. Model performance is systematically assessed using standard metrics including accuracy, AUC, and F1-score. Some approaches rely exclusively on image data, while others integrate image and text modalities, enabling interactive chatbot style querying of anomalies during inference. The experimental results indicate that supervised techniques consistently achieve the highest detection accuracy, but their effectiveness depends on access to large labeled datasets, which are costly and difficult to obtain in practice. Unsupervised approaches, relying only on normal samples, ease the labeling burden but generally produce lower precision. Zero-shot and MLLM-based methods can operate with minimal training data, offering adaptability and interpretability, albeit with somewhat lower accuracy compared to supervised methods We compare these approaches and highlight the trade-offs between accuracy, the amount of training data required, and explainability. While supervised methods remain the most effective for accuracy, MLLM-based techniques contribute valuable interpretability and interactive reasoning. Finally, we discuss practical implications for industrial quality control, where both reliability and transparency are critical.| File | Dimensione | Formato | |
|---|---|---|---|
|
Karimi_Armin.pdf
Accesso riservato
Dimensione
4.99 MB
Formato
Adobe PDF
|
4.99 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/98770