DOMAIN GENERALIZATION FOR SEMANTIC SEGMENTATION EXPLOITING VISION-LANGUAGE FEATURES

Domain Generalization in Semantic Segmentation (DGSS) is an attractive open research field in Computer Vision. It tackles the semantic segmentation performance drop that arises when predicting images of target datasets whose distribution is highly different from those of the source dataset. Unlike Unsupervised Domain Adaptation (UDA), where the target images, even though without their labels, can be exploited during training to facilitate the domain shift, Domain Generalization solely relies on the source dataset at training time. In some common settings, since manually labeling images for semantic segmentation is very time-consuming, the source dataset is typically made of synthetic images coming from video games (e.g., GTA5) or game engines (e.g., SELMA), and only during inference target datasets with real images (e.g., Cityscapes) are employed. Lately, Vision Language Models (VLMs) such as CLIP have shown their remarkable generalization capabilities across many image classification datasets. Indeed, the rich semantics learned from textual supervision allow them to handle much better the domain shift at test time. Even though some tried to use those models to improve on previous DGSS specialized models, only a few exploited the text representations to drive the task. In this work, we assess the direct contribution of language in solving the DGSS task on all the generalization scenarios (i.e., synthetic-to-real, real-to-real, and synthetic-to-synthetic) by building a model that employs VLMs as encoders to operate on image-text data and two decoders, one for each modality, that fuse the heterogeneous representations and solve the segmentation task.

DOMAIN GENERALIZATION FOR SEMANTIC SEGMENTATION EXPLOITING VISION-LANGUAGE FEATURES

CAREDDU, LUCA

2024/2025

Abstract

Domain Generalization in Semantic Segmentation (DGSS) is an attractive open research field in Computer Vision. It tackles the semantic segmentation performance drop that arises when predicting images of target datasets whose distribution is highly different from those of the source dataset. Unlike Unsupervised Domain Adaptation (UDA), where the target images, even though without their labels, can be exploited during training to facilitate the domain shift, Domain Generalization solely relies on the source dataset at training time. In some common settings, since manually labeling images for semantic segmentation is very time-consuming, the source dataset is typically made of synthetic images coming from video games (e.g., GTA5) or game engines (e.g., SELMA), and only during inference target datasets with real images (e.g., Cityscapes) are employed. Lately, Vision Language Models (VLMs) such as CLIP have shown their remarkable generalization capabilities across many image classification datasets. Indeed, the rich semantics learned from textual supervision allow them to handle much better the domain shift at test time. Even though some tried to use those models to improve on previous DGSS specialized models, only a few exploited the text representations to drive the task. In this work, we assess the direct contribution of language in solving the DGSS task on all the generalization scenarios (i.e., synthetic-to-real, real-to-real, and synthetic-to-synthetic) by building a model that employs VLMs as encoders to operate on image-text data and two decoders, one for each modality, that fuse the heterogeneous representations and solve the segmentation task.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				DATA SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				DOMAIN GENERALIZATION FOR SEMANTIC SEGMENTATION EXPLOITING VISION-LANGUAGE FEATURES
			
	Parola chiave
	
				DOMAIN GEN.
SEMANTIC SEGM
VISION-LANGUAGE
			
	Relatore
	
				ZANUTTIGH, PIETRO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Careddu_Luca.pdf accesso aperto Dimensione 13.79 MB Formato Adobe PDF Visualizza/Apri	13.79 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/84780