Semantic segmentation, thanks to multimodal datasets, can be made more reliable and accurate by mixing heterogeneous data, e.g. RGB images with LIDAR sequences or depth maps. However, it requires a large amount of annotated data to train neural networks able to solve the task effectively. Obtaining accurate and complete annotations for every image and LIDAR sample in a dataset (i.e. a label for each pixel or 3D point) can be a significant challenge, particularly when dealing with complex problems. To tackle the lack of annotation, Unsupervised Domain Adaptation (UDA) was studied and developed in recent years assuming labels are not available for the target dataset and supervising the training of the model using the labels of the source dataset. Using multimodal data in the UDA context is challenging since the different domains can be impacted differently by domain shifts. In this work, the exploration of autonomous driving datasets is undertaken to address the challenge of developing UDA techniques with a specific emphasis on adapting between RGB and Depth (RGB-D) and RGB-Lidar data. For all the datasets considered, the information from 2D images and 3D data (RGB-D or LIDAR-RGB) were used. The results demonstrate the effectiveness of adapting between different datasets and highlight how the accuracy of 3D data can be enhanced by leveraging the information provided by the images.
Semantic segmentation, thanks to multimodal datasets, can be made more reliable and accurate by mixing heterogeneous data, e.g. RGB images with LIDAR sequences or depth maps. However, it requires a large amount of annotated data to train neural networks able to solve the task effectively. Obtaining accurate and complete annotations for every image and LIDAR sample in a dataset (i.e. a label for each pixel or 3D point) can be a significant challenge, particularly when dealing with complex problems. To tackle the lack of annotation, Unsupervised Domain Adaptation (UDA) was studied and developed in recent years assuming labels are not available for the target dataset and supervising the training of the model using the labels of the source dataset. Using multimodal data in the UDA context is challenging since the different domains can be impacted differently by domain shifts. In this work, the exploration of autonomous driving datasets is undertaken to address the challenge of developing UDA techniques with a specific emphasis on adapting between RGB and Depth (RGB-D) and RGB-Lidar data. For all the datasets considered, the information from 2D images and 3D data (RGB-D or LIDAR-RGB) were used. The results demonstrate the effectiveness of adapting between different datasets and highlight how the accuracy of 3D data can be enhanced by leveraging the information provided by the images.
Multimodal Domain Adaptation for Point Cloud Semantic Segmentation
NICOLETTI, GIANPIETRO
2022/2023
Abstract
Semantic segmentation, thanks to multimodal datasets, can be made more reliable and accurate by mixing heterogeneous data, e.g. RGB images with LIDAR sequences or depth maps. However, it requires a large amount of annotated data to train neural networks able to solve the task effectively. Obtaining accurate and complete annotations for every image and LIDAR sample in a dataset (i.e. a label for each pixel or 3D point) can be a significant challenge, particularly when dealing with complex problems. To tackle the lack of annotation, Unsupervised Domain Adaptation (UDA) was studied and developed in recent years assuming labels are not available for the target dataset and supervising the training of the model using the labels of the source dataset. Using multimodal data in the UDA context is challenging since the different domains can be impacted differently by domain shifts. In this work, the exploration of autonomous driving datasets is undertaken to address the challenge of developing UDA techniques with a specific emphasis on adapting between RGB and Depth (RGB-D) and RGB-Lidar data. For all the datasets considered, the information from 2D images and 3D data (RGB-D or LIDAR-RGB) were used. The results demonstrate the effectiveness of adapting between different datasets and highlight how the accuracy of 3D data can be enhanced by leveraging the information provided by the images.File | Dimensione | Formato | |
---|---|---|---|
Nicoletti_Gianpietro.pdf
Open Access dal 26/10/2024
Dimensione
10.57 MB
Formato
Adobe PDF
|
10.57 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/56237