The advent of LiDAR technology has revolutionized the fields of autonomous driving, robotics, and environmental monitoring by providing precise 3D point cloud data. Semantic segmentation of LiDAR point clouds is an essential task for understanding the environment and facilitating intelligent decision-making in these applications. This Master's thesis introduces a cutting-edge approach, termed RangeRet, for 3D LiDAR Semantic Segmentation, which leverages the potential of range images to achieve real-time performance. It also exploits the Retentive Networks, a novel Natural Language Processing (NLP) architecture designed for Large Language Models. Through a comprehensive study of semantic segmentation in the context of 3D LiDAR data and state-of-the-art methods, the proposed approach introduces a lightweight network crafted to address key limitations in existing methods. It emphasizes efficiency, memory management, and real-time usage while achieving promising results. The implementation of the Retentive Networks in computer vision tasks serves as an alternative to the established Transformers architecture, aiming to better capture geometric and spatial information in two-dimensional objects. The evaluation of the proposed method on benchmark datasets, such as SemanticKITTI, involves analyzing accuracy, efficiency, and generalization across diverse scenarios. In the final section, the thesis conducts an ablation study, systematically dissecting the proposed method by isolating and evaluating individual components. This process identifies the contribution of each architectural element, provides insights into the network's robustness, and highlights key factors influencing performance.

The advent of LiDAR technology has revolutionized the fields of autonomous driving, robotics, and environmental monitoring by providing precise 3D point cloud data. Semantic segmentation of LiDAR point clouds is an essential task for understanding the environment and facilitating intelligent decision-making in these applications. This Master's thesis introduces a cutting-edge approach, termed RangeRet, for 3D LiDAR Semantic Segmentation, which leverages the potential of range images to achieve real-time performance. It also exploits the Retentive Networks, a novel Natural Language Processing (NLP) architecture designed for Large Language Models. Through a comprehensive study of semantic segmentation in the context of 3D LiDAR data and state-of-the-art methods, the proposed approach introduces a lightweight network crafted to address key limitations in existing methods. It emphasizes efficiency, memory management, and real-time usage while achieving promising results. The implementation of the Retentive Networks in computer vision tasks serves as an alternative to the established Transformers architecture, aiming to better capture geometric and spatial information in two-dimensional objects. The evaluation of the proposed method on benchmark datasets, such as SemanticKITTI, involves analyzing accuracy, efficiency, and generalization across diverse scenarios. In the final section, the thesis conducts an ablation study, systematically dissecting the proposed method by isolating and evaluating individual components. This process identifies the contribution of each architectural element, provides insights into the network's robustness, and highlights key factors influencing performance.

Exploiting Retentive Networks in 3D LiDAR Semantic Segmentation

MOSCO, SIMONE
2022/2023

Abstract

The advent of LiDAR technology has revolutionized the fields of autonomous driving, robotics, and environmental monitoring by providing precise 3D point cloud data. Semantic segmentation of LiDAR point clouds is an essential task for understanding the environment and facilitating intelligent decision-making in these applications. This Master's thesis introduces a cutting-edge approach, termed RangeRet, for 3D LiDAR Semantic Segmentation, which leverages the potential of range images to achieve real-time performance. It also exploits the Retentive Networks, a novel Natural Language Processing (NLP) architecture designed for Large Language Models. Through a comprehensive study of semantic segmentation in the context of 3D LiDAR data and state-of-the-art methods, the proposed approach introduces a lightweight network crafted to address key limitations in existing methods. It emphasizes efficiency, memory management, and real-time usage while achieving promising results. The implementation of the Retentive Networks in computer vision tasks serves as an alternative to the established Transformers architecture, aiming to better capture geometric and spatial information in two-dimensional objects. The evaluation of the proposed method on benchmark datasets, such as SemanticKITTI, involves analyzing accuracy, efficiency, and generalization across diverse scenarios. In the final section, the thesis conducts an ablation study, systematically dissecting the proposed method by isolating and evaluating individual components. This process identifies the contribution of each architectural element, provides insights into the network's robustness, and highlights key factors influencing performance.
2022
Exploiting Retentive Networks in 3D LiDAR Semantic Segmentation
The advent of LiDAR technology has revolutionized the fields of autonomous driving, robotics, and environmental monitoring by providing precise 3D point cloud data. Semantic segmentation of LiDAR point clouds is an essential task for understanding the environment and facilitating intelligent decision-making in these applications. This Master's thesis introduces a cutting-edge approach, termed RangeRet, for 3D LiDAR Semantic Segmentation, which leverages the potential of range images to achieve real-time performance. It also exploits the Retentive Networks, a novel Natural Language Processing (NLP) architecture designed for Large Language Models. Through a comprehensive study of semantic segmentation in the context of 3D LiDAR data and state-of-the-art methods, the proposed approach introduces a lightweight network crafted to address key limitations in existing methods. It emphasizes efficiency, memory management, and real-time usage while achieving promising results. The implementation of the Retentive Networks in computer vision tasks serves as an alternative to the established Transformers architecture, aiming to better capture geometric and spatial information in two-dimensional objects. The evaluation of the proposed method on benchmark datasets, such as SemanticKITTI, involves analyzing accuracy, efficiency, and generalization across diverse scenarios. In the final section, the thesis conducts an ablation study, systematically dissecting the proposed method by isolating and evaluating individual components. This process identifies the contribution of each architectural element, provides insights into the network's robustness, and highlights key factors influencing performance.
3D Segmentation
Retentive Networks
Range imaging
File in questo prodotto:
File Dimensione Formato  
Mosco_Simone.pdf

embargo fino al 13/12/2024

Dimensione 23.5 MB
Formato Adobe PDF
23.5 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/60407