Industrial assembly lines increasingly rely on automated vision systems to sort thousands of visually similar components. However, collecting large, labeled, multi-view image sets for every part is impractical, since parts often arrive directly from suppliers without prior access for training. This thesis investigates a synthetic-to-real pipeline for multi-view convolutional neural networks (MVCNNs). This approach enables classifiers to be trained from CAD models and then applied to real images in a five-camera imaging box. We developed a Blender-based renderer to generate a synthetic dataset of 80 parts with randomized poses, materials, lighting, and optics, which provides diverse five-view samples for training. Using this dataset, we evaluate transfer learning with ImageNet-pretrained backbones, freezing strategies, fusion mechanisms, weight sharing, and several backbone families. Freezing the first three ResNet-50 stages matches the accuracy of full fine-tuning while improving stability. Among fusion mechanisms, score-sum and deep early fusion achieve the most reliable transfer to real data. Full weight sharing across view branches improves robustness while reducing parameters. Backbone comparison shows that compact modern CNNs, such as ConvNeXt-Small, generalize best. Overall, the results demonstrate that synthetic training combined with judicious transfer learning, deep fusion, and full weight sharing yields near-perfect real-world accuracy with modest model footprints. However, the study is limited to 80 synthetic classes and a real sample evaluation set with a single physical object, while the target application involves up to 10,000 categories. The findings should therefore be regarded as preliminary, establishing a baseline and outlining a scalable route toward industrial deployment with reduced data-collection overhead.

Industrial assembly lines increasingly rely on automated vision systems to sort thousands of visually similar components. However, collecting large, labeled, multi-view image sets for every part is impractical, since parts often arrive directly from suppliers without prior access for training. This thesis investigates a synthetic-to-real pipeline for multi-view convolutional neural networks (MVCNNs). This approach enables classifiers to be trained from CAD models and then applied to real images in a five-camera imaging box. We developed a Blender-based renderer to generate a synthetic dataset of 80 parts with randomized poses, materials, lighting, and optics, which provides diverse five-view samples for training. Using this dataset, we evaluate transfer learning with ImageNet-pretrained backbones, freezing strategies, fusion mechanisms, weight sharing, and several backbone families. Freezing the first three ResNet-50 stages matches the accuracy of full fine-tuning while improving stability. Among fusion mechanisms, score-sum and deep early fusion achieve the most reliable transfer to real data. Full weight sharing across view branches improves robustness while reducing parameters. Backbone comparison shows that compact modern CNNs, such as ConvNeXt-Small, generalize best. Overall, the results demonstrate that synthetic training combined with judicious transfer learning, deep fusion, and full weight sharing yields near-perfect real-world accuracy with modest model footprints. However, the study is limited to 80 synthetic classes and a real sample evaluation set with a single physical object, while the target application involves up to 10,000 categories. The findings should therefore be regarded as preliminary, establishing a baseline and outlining a scalable route toward industrial deployment with reduced data-collection overhead.

Multi-View CNNs for Industrial Object Classification: From Synthetic Dataset Design to Transfer Learning and Fusion Strategies

FRIGO, GIANMARIA
2024/2025

Abstract

Industrial assembly lines increasingly rely on automated vision systems to sort thousands of visually similar components. However, collecting large, labeled, multi-view image sets for every part is impractical, since parts often arrive directly from suppliers without prior access for training. This thesis investigates a synthetic-to-real pipeline for multi-view convolutional neural networks (MVCNNs). This approach enables classifiers to be trained from CAD models and then applied to real images in a five-camera imaging box. We developed a Blender-based renderer to generate a synthetic dataset of 80 parts with randomized poses, materials, lighting, and optics, which provides diverse five-view samples for training. Using this dataset, we evaluate transfer learning with ImageNet-pretrained backbones, freezing strategies, fusion mechanisms, weight sharing, and several backbone families. Freezing the first three ResNet-50 stages matches the accuracy of full fine-tuning while improving stability. Among fusion mechanisms, score-sum and deep early fusion achieve the most reliable transfer to real data. Full weight sharing across view branches improves robustness while reducing parameters. Backbone comparison shows that compact modern CNNs, such as ConvNeXt-Small, generalize best. Overall, the results demonstrate that synthetic training combined with judicious transfer learning, deep fusion, and full weight sharing yields near-perfect real-world accuracy with modest model footprints. However, the study is limited to 80 synthetic classes and a real sample evaluation set with a single physical object, while the target application involves up to 10,000 categories. The findings should therefore be regarded as preliminary, establishing a baseline and outlining a scalable route toward industrial deployment with reduced data-collection overhead.
2024
Multi-View CNNs for Industrial Object Classification: From Synthetic Dataset Design to Transfer Learning and Fusion Strategies
Industrial assembly lines increasingly rely on automated vision systems to sort thousands of visually similar components. However, collecting large, labeled, multi-view image sets for every part is impractical, since parts often arrive directly from suppliers without prior access for training. This thesis investigates a synthetic-to-real pipeline for multi-view convolutional neural networks (MVCNNs). This approach enables classifiers to be trained from CAD models and then applied to real images in a five-camera imaging box. We developed a Blender-based renderer to generate a synthetic dataset of 80 parts with randomized poses, materials, lighting, and optics, which provides diverse five-view samples for training. Using this dataset, we evaluate transfer learning with ImageNet-pretrained backbones, freezing strategies, fusion mechanisms, weight sharing, and several backbone families. Freezing the first three ResNet-50 stages matches the accuracy of full fine-tuning while improving stability. Among fusion mechanisms, score-sum and deep early fusion achieve the most reliable transfer to real data. Full weight sharing across view branches improves robustness while reducing parameters. Backbone comparison shows that compact modern CNNs, such as ConvNeXt-Small, generalize best. Overall, the results demonstrate that synthetic training combined with judicious transfer learning, deep fusion, and full weight sharing yields near-perfect real-world accuracy with modest model footprints. However, the study is limited to 80 synthetic classes and a real sample evaluation set with a single physical object, while the target application involves up to 10,000 categories. The findings should therefore be regarded as preliminary, establishing a baseline and outlining a scalable route toward industrial deployment with reduced data-collection overhead.
CNN
Multi-View
Synthetic Data
Transfer Learning
Fusion
File in questo prodotto:
File Dimensione Formato  
Frigo_Gianmaria.pdf

accesso aperto

Dimensione 4.81 MB
Formato Adobe PDF
4.81 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/92194