In this thesis I explore the processing of 3D data and its industrial applications, utilizing both traditional computer vision techniques and modern methods based on deep learning. The ability to sense, perceive, and interpret the surrounding environment by a computer is a challenging task that requires a mathematical framework. While most research has historically focused on 2D data, the recent availability of more affordable 3D sensors and the advancement of powerful deep learning tools have made it possible to tackle tasks that were previously out of reach with standard 2D techniques. The thesis is divided into three parts. The first part provides an overview of the theory and methods that form the foundation of the applications developed in the subsequent parts. It begins with techniques and sensors for acquiring 3D data, followed by a discussion on the different ways to represent this information. It then delves into high-level 3D computer vision tasks, covering both traditional approaches as well as modern techniques using deep learning networks. The second part presents a deep learning application that I developed to address a 3D classification task. The network architecture is inspired by the Orientation Boosted Voxel Net, where the network is trained to learn object rotations as an auxiliary task using a combined categorical cross-entropy loss function. The novelty of my design lies in the complete redefinition of the architecture, where I employed skip connections to enable a deeper network, thereby avoiding vanishing gradient problems and facilitating more abstract and effective feature extraction. The full implementation of the dataset, model, network training, and testing was carried out in Python. The third part of the thesis demonstrates the application of the methods discussed for the design of an industrial system that I developed during my internship at Innova Srl. The aim was to create a general module capable of acquiring point clouds of objects moving on an industrial conveyor belt, followed a specific processing module. An example application is detecting the 3D pose of bread on the conveyor to guide a robotic arm in making cuts that improve cooking properties. A key innovation of the data acquisition module was the replacement of the conventional setup of multiple static 3D profilometers with a new system that utilizes just two profilometers moving perpendicularly to the belt’s velocity. This approach significantly reduces costs while introducing challenges related to distortion caused by the relative movement between the profilometers and the objects, as well as the stitching of subsequent scans. The processing component of the application was developed using the MVTec Halcon programming language and was integrated into a unified solution in C# that also manages communication with the sensors’ controller. Finally, the thesis illustrates how the processed data from the acquisition module can be used for robotic guidance applications, where 3D surface matching algorithms detect the target object and its pose within the scene and transmit this information to a robotic arm to perform a specif action.
In this thesis I explore the processing of 3D data and its industrial applications, utilizing both traditional computer vision techniques and modern methods based on deep learning. The ability to sense, perceive, and interpret the surrounding environment by a computer is a challenging task that requires a mathematical framework. While most research has historically focused on 2D data, the recent availability of more affordable 3D sensors and the advancement of powerful deep learning tools have made it possible to tackle tasks that were previously out of reach with standard 2D techniques. The thesis is divided into three parts. The first part provides an overview of the theory and methods that form the foundation of the applications developed in the subsequent parts. It begins with techniques and sensors for acquiring 3D data, followed by a discussion on the different ways to represent this information. It then delves into high-level 3D computer vision tasks, covering both traditional approaches as well as modern techniques using deep learning networks. The second part presents a deep learning application that I developed to address a 3D classification task. The network architecture is inspired by the Orientation Boosted Voxel Net, where the network is trained to learn object rotations as an auxiliary task using a combined categorical cross-entropy loss function. The novelty of my design lies in the complete redefinition of the architecture, where I employed skip connections to enable a deeper network, thereby avoiding vanishing gradient problems and facilitating more abstract and effective feature extraction. The full implementation of the dataset, model, network training, and testing was carried out in Python. The third part of the thesis demonstrates the application of the methods discussed for the design of an industrial system that I developed during my internship at Innova Srl. The aim was to create a general module capable of acquiring point clouds of objects moving on an industrial conveyor belt, followed a specific processing module. An example application is detecting the 3D pose of bread on the conveyor to guide a robotic arm in making cuts that improve cooking properties. A key innovation of the data acquisition module was the replacement of the conventional setup of multiple static 3D profilometers with a new system that utilizes just two profilometers moving perpendicularly to the belt’s velocity. This approach significantly reduces costs while introducing challenges related to distortion caused by the relative movement between the profilometers and the objects, as well as the stitching of subsequent scans. The processing component of the application was developed using the MVTec Halcon programming language and was integrated into a unified solution in C# that also manages communication with the sensors’ controller. Finally, the thesis illustrates how the processed data from the acquisition module can be used for robotic guidance applications, where 3D surface matching algorithms detect the target object and its pose within the scene and transmit this information to a robotic arm to perform a specif action.
3D Data Processing with Deep Learning and Industrial Applications
BANO, EMANUELE
2023/2024
Abstract
In this thesis I explore the processing of 3D data and its industrial applications, utilizing both traditional computer vision techniques and modern methods based on deep learning. The ability to sense, perceive, and interpret the surrounding environment by a computer is a challenging task that requires a mathematical framework. While most research has historically focused on 2D data, the recent availability of more affordable 3D sensors and the advancement of powerful deep learning tools have made it possible to tackle tasks that were previously out of reach with standard 2D techniques. The thesis is divided into three parts. The first part provides an overview of the theory and methods that form the foundation of the applications developed in the subsequent parts. It begins with techniques and sensors for acquiring 3D data, followed by a discussion on the different ways to represent this information. It then delves into high-level 3D computer vision tasks, covering both traditional approaches as well as modern techniques using deep learning networks. The second part presents a deep learning application that I developed to address a 3D classification task. The network architecture is inspired by the Orientation Boosted Voxel Net, where the network is trained to learn object rotations as an auxiliary task using a combined categorical cross-entropy loss function. The novelty of my design lies in the complete redefinition of the architecture, where I employed skip connections to enable a deeper network, thereby avoiding vanishing gradient problems and facilitating more abstract and effective feature extraction. The full implementation of the dataset, model, network training, and testing was carried out in Python. The third part of the thesis demonstrates the application of the methods discussed for the design of an industrial system that I developed during my internship at Innova Srl. The aim was to create a general module capable of acquiring point clouds of objects moving on an industrial conveyor belt, followed a specific processing module. An example application is detecting the 3D pose of bread on the conveyor to guide a robotic arm in making cuts that improve cooking properties. A key innovation of the data acquisition module was the replacement of the conventional setup of multiple static 3D profilometers with a new system that utilizes just two profilometers moving perpendicularly to the belt’s velocity. This approach significantly reduces costs while introducing challenges related to distortion caused by the relative movement between the profilometers and the objects, as well as the stitching of subsequent scans. The processing component of the application was developed using the MVTec Halcon programming language and was integrated into a unified solution in C# that also manages communication with the sensors’ controller. Finally, the thesis illustrates how the processed data from the acquisition module can be used for robotic guidance applications, where 3D surface matching algorithms detect the target object and its pose within the scene and transmit this information to a robotic arm to perform a specif action.File | Dimensione | Formato | |
---|---|---|---|
Bano_Emanuele.pdf
accesso aperto
Dimensione
23.33 MB
Formato
Adobe PDF
|
23.33 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/73121