Semantic segmentation, thanks to multimodal datasets, can be made more reliable and accurate by mixing heterogeneous data, e.g. RGB images with LIDAR sequences or depth maps. However, it requires a large amount of annotated data to train neural networks able to solve the task effectively. Obtaining accurate and complete annotations for every image and LIDAR sample in a dataset (i.e. a label for each pixel or 3D point) can be a significant challenge, particularly when dealing with complex problems. To tackle the lack of annotation, Unsupervised Domain Adaptation (UDA) was studied and developed in recent years assuming labels are not available for the target dataset and supervising the training of the model using the labels of the source dataset. Using multimodal data in the UDA context is challenging since the different domains can be impacted differently by domain shifts. In this work, the exploration of autonomous driving datasets is undertaken to address the challenge of developing UDA techniques with a specific emphasis on adapting between RGB and Depth (RGB-D) and RGB-Lidar data. For all the datasets considered, the information from 2D images and 3D data (RGB-D or LIDAR-RGB) were used. The results demonstrate the effectiveness of adapting between different datasets and highlight how the accuracy of 3D data can be enhanced by leveraging the information provided by the images.

Semantic segmentation, thanks to multimodal datasets, can be made more reliable and accurate by mixing heterogeneous data, e.g. RGB images with LIDAR sequences or depth maps. However, it requires a large amount of annotated data to train neural networks able to solve the task effectively. Obtaining accurate and complete annotations for every image and LIDAR sample in a dataset (i.e. a label for each pixel or 3D point) can be a significant challenge, particularly when dealing with complex problems. To tackle the lack of annotation, Unsupervised Domain Adaptation (UDA) was studied and developed in recent years assuming labels are not available for the target dataset and supervising the training of the model using the labels of the source dataset. Using multimodal data in the UDA context is challenging since the different domains can be impacted differently by domain shifts. In this work, the exploration of autonomous driving datasets is undertaken to address the challenge of developing UDA techniques with a specific emphasis on adapting between RGB and Depth (RGB-D) and RGB-Lidar data. For all the datasets considered, the information from 2D images and 3D data (RGB-D or LIDAR-RGB) were used. The results demonstrate the effectiveness of adapting between different datasets and highlight how the accuracy of 3D data can be enhanced by leveraging the information provided by the images.

Multimodal Domain Adaptation for Point Cloud Semantic Segmentation

NICOLETTI, GIANPIETRO
2022/2023

Abstract

Semantic segmentation, thanks to multimodal datasets, can be made more reliable and accurate by mixing heterogeneous data, e.g. RGB images with LIDAR sequences or depth maps. However, it requires a large amount of annotated data to train neural networks able to solve the task effectively. Obtaining accurate and complete annotations for every image and LIDAR sample in a dataset (i.e. a label for each pixel or 3D point) can be a significant challenge, particularly when dealing with complex problems. To tackle the lack of annotation, Unsupervised Domain Adaptation (UDA) was studied and developed in recent years assuming labels are not available for the target dataset and supervising the training of the model using the labels of the source dataset. Using multimodal data in the UDA context is challenging since the different domains can be impacted differently by domain shifts. In this work, the exploration of autonomous driving datasets is undertaken to address the challenge of developing UDA techniques with a specific emphasis on adapting between RGB and Depth (RGB-D) and RGB-Lidar data. For all the datasets considered, the information from 2D images and 3D data (RGB-D or LIDAR-RGB) were used. The results demonstrate the effectiveness of adapting between different datasets and highlight how the accuracy of 3D data can be enhanced by leveraging the information provided by the images.
2022
Multimodal Domain Adaptation for Point Cloud Semantic Segmentation
Semantic segmentation, thanks to multimodal datasets, can be made more reliable and accurate by mixing heterogeneous data, e.g. RGB images with LIDAR sequences or depth maps. However, it requires a large amount of annotated data to train neural networks able to solve the task effectively. Obtaining accurate and complete annotations for every image and LIDAR sample in a dataset (i.e. a label for each pixel or 3D point) can be a significant challenge, particularly when dealing with complex problems. To tackle the lack of annotation, Unsupervised Domain Adaptation (UDA) was studied and developed in recent years assuming labels are not available for the target dataset and supervising the training of the model using the labels of the source dataset. Using multimodal data in the UDA context is challenging since the different domains can be impacted differently by domain shifts. In this work, the exploration of autonomous driving datasets is undertaken to address the challenge of developing UDA techniques with a specific emphasis on adapting between RGB and Depth (RGB-D) and RGB-Lidar data. For all the datasets considered, the information from 2D images and 3D data (RGB-D or LIDAR-RGB) were used. The results demonstrate the effectiveness of adapting between different datasets and highlight how the accuracy of 3D data can be enhanced by leveraging the information provided by the images.
Domain Adaptation
Segmentation
Point Clouds
Multimodal
Deep Learning
File in questo prodotto:
File Dimensione Formato  
Nicoletti_Gianpietro.pdf

embargo fino al 25/10/2024

Dimensione 10.57 MB
Formato Adobe PDF
10.57 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/56237