Hybrid Encoder and Architectures for Advanced Monocular Depth Estimation: A Comparative Synthesis Approach

Monocular depth estimation represents a critical capability in computer vision systems, with applications ranging from autonomous driving to augmented reality. Despite significant advances in this field, current solutions often require a trade-off between model performance and computational efficiency, limiting their practical deployment. This thesis introduces an innovative architectural integration approach that combines the complementary strengths of two leading models: Xi-Net and Lite-Mono, to address this fundamental challenge. The methodology involves a systematic evaluation of architectural fusion strategies, carefully analyzing how components from both source models can be effectively combined while preserving their respective strengths. We conduct extensive experiments using the KITTI dataset, a comprehensive benchmark suite for autonomous driving applications, to validate our approach across diverse real-world scenarios including urban, residential, and highway environments. The model variants we develop offer different trade-offs between performance and resource utilization, making them suitable for a range of deployment scenarios from mobile devices to more powerful computing platforms. This work contributes to the field of computer vision by establishing a new paradigm for model integration that could serve as a blueprint for future efforts in creating efficient, high-performance vision systems. Furthermore, our findings advance the understanding of architectural design principles that enable effective model scaling across different computational constraints, potentially enabling broader adoption of advanced computer vision capabilities in resource-constrained environments.

Hybrid Encoder and Architectures for Advanced Monocular Depth Estimation: A Comparative Synthesis Approach

DI LABBIO, DANIELA

2024/2025

Abstract

Monocular depth estimation represents a critical capability in computer vision systems, with applications ranging from autonomous driving to augmented reality. Despite significant advances in this field, current solutions often require a trade-off between model performance and computational efficiency, limiting their practical deployment. This thesis introduces an innovative architectural integration approach that combines the complementary strengths of two leading models: Xi-Net and Lite-Mono, to address this fundamental challenge. The methodology involves a systematic evaluation of architectural fusion strategies, carefully analyzing how components from both source models can be effectively combined while preserving their respective strengths. We conduct extensive experiments using the KITTI dataset, a comprehensive benchmark suite for autonomous driving applications, to validate our approach across diverse real-world scenarios including urban, residential, and highway environments. The model variants we develop offer different trade-offs between performance and resource utilization, making them suitable for a range of deployment scenarios from mobile devices to more powerful computing platforms. This work contributes to the field of computer vision by establishing a new paradigm for model integration that could serve as a blueprint for future efforts in creating efficient, high-performance vision systems. Furthermore, our findings advance the understanding of architectural design principles that enable effective model scaling across different computational constraints, potentially enabling broader adoption of advanced computer vision capabilities in resource-constrained environments.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				DATA SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Hybrid Encoder and Architectures for Advanced Monocular Depth Estimation: A Comparative Synthesis Approach
			
	Abstract in italiano
	
				Monocular depth estimation represents a critical capability in computer vision systems, with applications ranging from autonomous driving to augmented reality. Despite significant advances in this field, current solutions often require a trade-off between model performance and computational efficiency, limiting their practical deployment. This thesis introduces an innovative architectural integration approach that combines the complementary strengths of two leading models: Xi-Net and Lite-Mono, to address this fundamental challenge. The methodology involves a systematic evaluation of architectural fusion strategies, carefully analyzing how components from both source models can be effectively combined while preserving their respective strengths. We conduct extensive experiments using the KITTI dataset, a comprehensive benchmark suite for autonomous driving applications, to validate our approach across diverse real-world scenarios including urban, residential, and highway environments. The model variants we develop offer different trade-offs between performance and resource utilization, making them suitable for a range of deployment scenarios from mobile devices to more powerful computing platforms. This work contributes to the field of computer vision by establishing a new paradigm for model integration that could serve as a blueprint for future efforts in creating efficient, high-performance vision systems. Furthermore, our findings advance the understanding of architectural design principles that enable effective model scaling across different computational constraints, potentially enabling broader adoption of advanced computer vision capabilities in resource-constrained environments.
			
	Parola chiave
	
				Vision
Depth
Estimation
Optimization
			
	Relatore
	
				BALLAN, LAMBERTO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
DiLabbio_Daniela.pdf accesso aperto Dimensione 19.39 MB Formato Adobe PDF Visualizza/Apri	19.39 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/81802