3D Data Processing with Deep Learning and Industrial Applications

In this thesis I explore the processing of 3D data and its industrial applications, utilizing both traditional computer vision techniques and modern methods based on deep learning. The ability to sense, perceive, and interpret the surrounding environment by a computer is a challenging task that requires a mathematical framework. While most research has historically focused on 2D data, the recent availability of more affordable 3D sensors and the advancement of powerful deep learning tools have made it possible to tackle tasks that were previously out of reach with standard 2D techniques. The thesis is divided into three parts. The first part provides an overview of the theory and methods that form the foundation of the applications developed in the subsequent parts. It begins with techniques and sensors for acquiring 3D data, followed by a discussion on the different ways to represent this information. It then delves into high-level 3D computer vision tasks, covering both traditional approaches as well as modern techniques using deep learning networks. The second part presents a deep learning application that I developed to address a 3D classification task. The network architecture is inspired by the Orientation Boosted Voxel Net, where the network is trained to learn object rotations as an auxiliary task using a combined categorical cross-entropy loss function. The novelty of my design lies in the complete redefinition of the architecture, where I employed skip connections to enable a deeper network, thereby avoiding vanishing gradient problems and facilitating more abstract and effective feature extraction. The full implementation of the dataset, model, network training, and testing was carried out in Python. The third part of the thesis demonstrates the application of the methods discussed for the design of an industrial system that I developed during my internship at Innova Srl. The aim was to create a general module capable of acquiring point clouds of objects moving on an industrial conveyor belt, followed a specific processing module. An example application is detecting the 3D pose of bread on the conveyor to guide a robotic arm in making cuts that improve cooking properties. A key innovation of the data acquisition module was the replacement of the conventional setup of multiple static 3D profilometers with a new system that utilizes just two profilometers moving perpendicularly to the belt’s velocity. This approach significantly reduces costs while introducing challenges related to distortion caused by the relative movement between the profilometers and the objects, as well as the stitching of subsequent scans. The processing component of the application was developed using the MVTec Halcon programming language and was integrated into a unified solution in C# that also manages communication with the sensors’ controller. Finally, the thesis illustrates how the processed data from the acquisition module can be used for robotic guidance applications, where 3D surface matching algorithms detect the target object and its pose within the scene and transmit this information to a robotic arm to perform a specif action.

3D Data Processing with Deep Learning and Industrial Applications

BANO, EMANUELE

2023/2024

Abstract

In this thesis I explore the processing of 3D data and its industrial applications, utilizing both traditional computer vision techniques and modern methods based on deep learning. The ability to sense, perceive, and interpret the surrounding environment by a computer is a challenging task that requires a mathematical framework. While most research has historically focused on 2D data, the recent availability of more affordable 3D sensors and the advancement of powerful deep learning tools have made it possible to tackle tasks that were previously out of reach with standard 2D techniques. The thesis is divided into three parts. The first part provides an overview of the theory and methods that form the foundation of the applications developed in the subsequent parts. It begins with techniques and sensors for acquiring 3D data, followed by a discussion on the different ways to represent this information. It then delves into high-level 3D computer vision tasks, covering both traditional approaches as well as modern techniques using deep learning networks. The second part presents a deep learning application that I developed to address a 3D classification task. The network architecture is inspired by the Orientation Boosted Voxel Net, where the network is trained to learn object rotations as an auxiliary task using a combined categorical cross-entropy loss function. The novelty of my design lies in the complete redefinition of the architecture, where I employed skip connections to enable a deeper network, thereby avoiding vanishing gradient problems and facilitating more abstract and effective feature extraction. The full implementation of the dataset, model, network training, and testing was carried out in Python. The third part of the thesis demonstrates the application of the methods discussed for the design of an industrial system that I developed during my internship at Innova Srl. The aim was to create a general module capable of acquiring point clouds of objects moving on an industrial conveyor belt, followed a specific processing module. An example application is detecting the 3D pose of bread on the conveyor to guide a robotic arm in making cuts that improve cooking properties. A key innovation of the data acquisition module was the replacement of the conventional setup of multiple static 3D profilometers with a new system that utilizes just two profilometers moving perpendicularly to the belt’s velocity. This approach significantly reduces costs while introducing challenges related to distortion caused by the relative movement between the profilometers and the objects, as well as the stitching of subsequent scans. The processing component of the application was developed using the MVTec Halcon programming language and was integrated into a unified solution in C# that also manages communication with the sensors’ controller. Finally, the thesis illustrates how the processed data from the acquisition module can be used for robotic guidance applications, where 3D surface matching algorithms detect the target object and its pose within the scene and transmit this information to a robotic arm to perform a specif action.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				CONTROL SYSTEMS ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				3D Data Processing with Deep Learning and Industrial Applications
			
	Abstract in italiano
	
				In this thesis I explore the processing of 3D data and its industrial applications, utilizing both
traditional computer vision techniques and modern methods based on deep learning. The ability
to sense, perceive, and interpret the surrounding environment by a computer is a challenging
task that requires a mathematical framework. While most research has historically focused on 2D
data, the recent availability of more affordable 3D sensors and the advancement of powerful deep
learning tools have made it possible to tackle tasks that were previously out of reach with standard
2D techniques.
The thesis is divided into three parts. The first part provides an overview of the theory and
methods that form the foundation of the applications developed in the subsequent parts. It begins
with techniques and sensors for acquiring 3D data, followed by a discussion on the different ways
to represent this information. It then delves into high-level 3D computer vision tasks, covering
both traditional approaches as well as modern techniques using deep learning networks.
The second part presents a deep learning application that I developed to address a 3D classification task. The network architecture is inspired by the Orientation Boosted Voxel Net, where
the network is trained to learn object rotations as an auxiliary task using a combined categorical
cross-entropy loss function. The novelty of my design lies in the complete redefinition of the
architecture, where I employed skip connections to enable a deeper network, thereby avoiding
vanishing gradient problems and facilitating more abstract and effective feature extraction. The
full implementation of the dataset, model, network training, and testing was carried out in Python.
The third part of the thesis demonstrates the application of the methods discussed for the design
of an industrial system that I developed during my internship at Innova Srl. The aim was to create
a general module capable of acquiring point clouds of objects moving on an industrial conveyor
belt, followed a specific processing module. An example application is detecting the 3D pose of
bread on the conveyor to guide a robotic arm in making cuts that improve cooking properties. A
key innovation of the data acquisition module was the replacement of the conventional setup of
multiple static 3D profilometers with a new system that utilizes just two profilometers moving
perpendicularly to the belt’s velocity. This approach significantly reduces costs while introducing
challenges related to distortion caused by the relative movement between the profilometers and the
objects, as well as the stitching of subsequent scans. The processing component of the application
was developed using the MVTec Halcon programming language and was integrated into a unified
solution in C# that also manages communication with the sensors’ controller. Finally, the thesis
illustrates how the processed data from the acquisition module can be used for robotic guidance
applications, where 3D surface matching algorithms detect the target object and its pose within
the scene and transmit this information to a robotic arm to perform a specif action.
			
	Parola chiave
	
				3D Data Processing
Deep Learning
Profilometers
3D matching
Computer Vision
			
	Relatore
	
				BOSCHETTI, GIOVANNI
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Bano_Emanuele.pdf accesso aperto Dimensione 23.33 MB Formato Adobe PDF Visualizza/Apri	23.33 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/73121