Monocular depth estimation represents a critical capability in computer vision systems, with applications ranging from autonomous driving to augmented reality. Despite significant advances in this field, current solutions often require a trade-off between model performance and computational efficiency, limiting their practical deployment. This thesis introduces an innovative architectural integration approach that combines the complementary strengths of two leading models: Xi-Net and Lite-Mono, to address this fundamental challenge. The methodology involves a systematic evaluation of architectural fusion strategies, carefully analyzing how components from both source models can be effectively combined while preserving their respective strengths. We conduct extensive experiments using the KITTI dataset, a comprehensive benchmark suite for autonomous driving applications, to validate our approach across diverse real-world scenarios including urban, residential, and highway environments. The model variants we develop offer different trade-offs between performance and resource utilization, making them suitable for a range of deployment scenarios from mobile devices to more powerful computing platforms. This work contributes to the field of computer vision by establishing a new paradigm for model integration that could serve as a blueprint for future efforts in creating efficient, high-performance vision systems. Furthermore, our findings advance the understanding of architectural design principles that enable effective model scaling across different computational constraints, potentially enabling broader adoption of advanced computer vision capabilities in resource-constrained environments.
Monocular depth estimation represents a critical capability in computer vision systems, with applications ranging from autonomous driving to augmented reality. Despite significant advances in this field, current solutions often require a trade-off between model performance and computational efficiency, limiting their practical deployment. This thesis introduces an innovative architectural integration approach that combines the complementary strengths of two leading models: Xi-Net and Lite-Mono, to address this fundamental challenge. The methodology involves a systematic evaluation of architectural fusion strategies, carefully analyzing how components from both source models can be effectively combined while preserving their respective strengths. We conduct extensive experiments using the KITTI dataset, a comprehensive benchmark suite for autonomous driving applications, to validate our approach across diverse real-world scenarios including urban, residential, and highway environments. The model variants we develop offer different trade-offs between performance and resource utilization, making them suitable for a range of deployment scenarios from mobile devices to more powerful computing platforms. This work contributes to the field of computer vision by establishing a new paradigm for model integration that could serve as a blueprint for future efforts in creating efficient, high-performance vision systems. Furthermore, our findings advance the understanding of architectural design principles that enable effective model scaling across different computational constraints, potentially enabling broader adoption of advanced computer vision capabilities in resource-constrained environments.
Hybrid Encoder and Architectures for Advanced Monocular Depth Estimation: A Comparative Synthesis Approach
DI LABBIO, DANIELA
2024/2025
Abstract
Monocular depth estimation represents a critical capability in computer vision systems, with applications ranging from autonomous driving to augmented reality. Despite significant advances in this field, current solutions often require a trade-off between model performance and computational efficiency, limiting their practical deployment. This thesis introduces an innovative architectural integration approach that combines the complementary strengths of two leading models: Xi-Net and Lite-Mono, to address this fundamental challenge. The methodology involves a systematic evaluation of architectural fusion strategies, carefully analyzing how components from both source models can be effectively combined while preserving their respective strengths. We conduct extensive experiments using the KITTI dataset, a comprehensive benchmark suite for autonomous driving applications, to validate our approach across diverse real-world scenarios including urban, residential, and highway environments. The model variants we develop offer different trade-offs between performance and resource utilization, making them suitable for a range of deployment scenarios from mobile devices to more powerful computing platforms. This work contributes to the field of computer vision by establishing a new paradigm for model integration that could serve as a blueprint for future efforts in creating efficient, high-performance vision systems. Furthermore, our findings advance the understanding of architectural design principles that enable effective model scaling across different computational constraints, potentially enabling broader adoption of advanced computer vision capabilities in resource-constrained environments.File | Dimensione | Formato | |
---|---|---|---|
DiLabbio_Daniela.pdf
accesso aperto
Dimensione
19.39 MB
Formato
Adobe PDF
|
19.39 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/81802