Multi-View CNNs for Industrial Object Classification: From Synthetic Dataset Design to Transfer Learning and Fusion Strategies

Industrial assembly lines increasingly rely on automated vision systems to sort thousands of visually similar components. However, collecting large, labeled, multi-view image sets for every part is impractical, since parts often arrive directly from suppliers without prior access for training. This thesis investigates a synthetic-to-real pipeline for multi-view convolutional neural networks (MVCNNs). This approach enables classifiers to be trained from CAD models and then applied to real images in a five-camera imaging box. We developed a Blender-based renderer to generate a synthetic dataset of 80 parts with randomized poses, materials, lighting, and optics, which provides diverse five-view samples for training. Using this dataset, we evaluate transfer learning with ImageNet-pretrained backbones, freezing strategies, fusion mechanisms, weight sharing, and several backbone families. Freezing the first three ResNet-50 stages matches the accuracy of full fine-tuning while improving stability. Among fusion mechanisms, score-sum and deep early fusion achieve the most reliable transfer to real data. Full weight sharing across view branches improves robustness while reducing parameters. Backbone comparison shows that compact modern CNNs, such as ConvNeXt-Small, generalize best. Overall, the results demonstrate that synthetic training combined with judicious transfer learning, deep fusion, and full weight sharing yields near-perfect real-world accuracy with modest model footprints. However, the study is limited to 80 synthetic classes and a real sample evaluation set with a single physical object, while the target application involves up to 10,000 categories. The findings should therefore be regarded as preliminary, establishing a baseline and outlining a scalable route toward industrial deployment with reduced data-collection overhead.

Multi-View CNNs for Industrial Object Classification: From Synthetic Dataset Design to Transfer Learning and Fusion Strategies

FRIGO, GIANMARIA

2024/2025

Abstract

Industrial assembly lines increasingly rely on automated vision systems to sort thousands of visually similar components. However, collecting large, labeled, multi-view image sets for every part is impractical, since parts often arrive directly from suppliers without prior access for training. This thesis investigates a synthetic-to-real pipeline for multi-view convolutional neural networks (MVCNNs). This approach enables classifiers to be trained from CAD models and then applied to real images in a five-camera imaging box. We developed a Blender-based renderer to generate a synthetic dataset of 80 parts with randomized poses, materials, lighting, and optics, which provides diverse five-view samples for training. Using this dataset, we evaluate transfer learning with ImageNet-pretrained backbones, freezing strategies, fusion mechanisms, weight sharing, and several backbone families. Freezing the first three ResNet-50 stages matches the accuracy of full fine-tuning while improving stability. Among fusion mechanisms, score-sum and deep early fusion achieve the most reliable transfer to real data. Full weight sharing across view branches improves robustness while reducing parameters. Backbone comparison shows that compact modern CNNs, such as ConvNeXt-Small, generalize best. Overall, the results demonstrate that synthetic training combined with judicious transfer learning, deep fusion, and full weight sharing yields near-perfect real-world accuracy with modest model footprints. However, the study is limited to 80 synthetic classes and a real sample evaluation set with a single physical object, while the target application involves up to 10,000 categories. The findings should therefore be regarded as preliminary, establishing a baseline and outlining a scalable route toward industrial deployment with reduced data-collection overhead.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				INGEGNERIA INFORMATICA Laurea di Primo Livello (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Multi-View CNNs for Industrial Object Classification: From Synthetic Dataset Design to Transfer Learning and Fusion Strategies
			
	Abstract in italiano
	
				Industrial assembly lines increasingly rely on automated vision systems to sort thousands of visually similar components. However, collecting large, labeled, multi-view image sets for every part is impractical, since parts often arrive directly from suppliers without prior access for training. This thesis investigates a synthetic-to-real pipeline for multi-view convolutional neural networks (MVCNNs). This approach enables classifiers to be trained from CAD models and then applied to real images in a five-camera imaging box.
We developed a Blender-based renderer to generate a synthetic dataset of 80 parts with randomized poses, materials, lighting, and optics, which provides diverse five-view samples for training. Using this dataset, we evaluate transfer learning with ImageNet-pretrained backbones, freezing strategies, fusion mechanisms, weight sharing, and several backbone families. Freezing the first three ResNet-50 stages matches the accuracy of full fine-tuning while improving stability. Among fusion mechanisms, score-sum and deep early fusion achieve the most reliable transfer to real data. Full weight sharing across view branches improves robustness while reducing parameters. Backbone comparison shows that compact modern CNNs, such as ConvNeXt-Small, generalize best.
Overall, the results demonstrate that synthetic training combined with judicious transfer learning, deep fusion, and full weight sharing yields near-perfect real-world accuracy with modest model footprints. However, the study is limited to 80 synthetic classes and a real sample evaluation set with a single physical object, while the target application involves up to 10,000 categories. The findings should therefore be regarded as preliminary, establishing a baseline and outlining a scalable route toward industrial deployment with reduced data-collection overhead.
			
	Parola chiave
	
				CNN
Multi-View
Synthetic Data
Transfer Learning
Fusion
			
	Relatore
	
				GHIDONI, STEFANO
			
	Appare nelle tipologie:
	
				Lauree triennali

File in questo prodotto:

File	Dimensione	Formato
Frigo_Gianmaria.pdf accesso aperto Dimensione 4.81 MB Formato Adobe PDF Visualizza/Apri	4.81 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/92194