The advent of LiDAR technology has revolutionized the fields of autonomous driving, robotics, and environmental monitoring by providing precise 3D point cloud data. Semantic segmentation of LiDAR point clouds is an essential task for understanding the environment and facilitating intelligent decision-making in these applications. This Master's thesis introduces a cutting-edge approach, termed RangeRet, for 3D LiDAR Semantic Segmentation, which leverages the potential of range images to achieve real-time performance. It also exploits the Retentive Networks, a novel Natural Language Processing (NLP) architecture designed for Large Language Models. Through a comprehensive study of semantic segmentation in the context of 3D LiDAR data and state-of-the-art methods, the proposed approach introduces a lightweight network crafted to address key limitations in existing methods. It emphasizes efficiency, memory management, and real-time usage while achieving promising results. The implementation of the Retentive Networks in computer vision tasks serves as an alternative to the established Transformers architecture, aiming to better capture geometric and spatial information in two-dimensional objects. The evaluation of the proposed method on benchmark datasets, such as SemanticKITTI, involves analyzing accuracy, efficiency, and generalization across diverse scenarios. In the final section, the thesis conducts an ablation study, systematically dissecting the proposed method by isolating and evaluating individual components. This process identifies the contribution of each architectural element, provides insights into the network's robustness, and highlights key factors influencing performance.
The advent of LiDAR technology has revolutionized the fields of autonomous driving, robotics, and environmental monitoring by providing precise 3D point cloud data. Semantic segmentation of LiDAR point clouds is an essential task for understanding the environment and facilitating intelligent decision-making in these applications. This Master's thesis introduces a cutting-edge approach, termed RangeRet, for 3D LiDAR Semantic Segmentation, which leverages the potential of range images to achieve real-time performance. It also exploits the Retentive Networks, a novel Natural Language Processing (NLP) architecture designed for Large Language Models. Through a comprehensive study of semantic segmentation in the context of 3D LiDAR data and state-of-the-art methods, the proposed approach introduces a lightweight network crafted to address key limitations in existing methods. It emphasizes efficiency, memory management, and real-time usage while achieving promising results. The implementation of the Retentive Networks in computer vision tasks serves as an alternative to the established Transformers architecture, aiming to better capture geometric and spatial information in two-dimensional objects. The evaluation of the proposed method on benchmark datasets, such as SemanticKITTI, involves analyzing accuracy, efficiency, and generalization across diverse scenarios. In the final section, the thesis conducts an ablation study, systematically dissecting the proposed method by isolating and evaluating individual components. This process identifies the contribution of each architectural element, provides insights into the network's robustness, and highlights key factors influencing performance.
Exploiting Retentive Networks in 3D LiDAR Semantic Segmentation
MOSCO, SIMONE
2022/2023
Abstract
The advent of LiDAR technology has revolutionized the fields of autonomous driving, robotics, and environmental monitoring by providing precise 3D point cloud data. Semantic segmentation of LiDAR point clouds is an essential task for understanding the environment and facilitating intelligent decision-making in these applications. This Master's thesis introduces a cutting-edge approach, termed RangeRet, for 3D LiDAR Semantic Segmentation, which leverages the potential of range images to achieve real-time performance. It also exploits the Retentive Networks, a novel Natural Language Processing (NLP) architecture designed for Large Language Models. Through a comprehensive study of semantic segmentation in the context of 3D LiDAR data and state-of-the-art methods, the proposed approach introduces a lightweight network crafted to address key limitations in existing methods. It emphasizes efficiency, memory management, and real-time usage while achieving promising results. The implementation of the Retentive Networks in computer vision tasks serves as an alternative to the established Transformers architecture, aiming to better capture geometric and spatial information in two-dimensional objects. The evaluation of the proposed method on benchmark datasets, such as SemanticKITTI, involves analyzing accuracy, efficiency, and generalization across diverse scenarios. In the final section, the thesis conducts an ablation study, systematically dissecting the proposed method by isolating and evaluating individual components. This process identifies the contribution of each architectural element, provides insights into the network's robustness, and highlights key factors influencing performance.File | Dimensione | Formato | |
---|---|---|---|
Mosco_Simone.pdf
Open Access dal 14/12/2024
Dimensione
23.5 MB
Formato
Adobe PDF
|
23.5 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/60407