Evaluation and comparison of layout-guided text-to-image diffusion models

This study investigates text-to-image diffusion models, which generate images from textual descriptions. While promising for applications like art and creative content creation, these models face challenges, particularly with missing elements or incorrect attribute bindings when handling complex scenes with multiple elements and spatial relationships. To address these limitations, layout-guided models have emerged, allowing users to specify layouts (e.g., bounding boxes) alongside captions for better control. However, while much research has focused on text-to-image models, layout-guided models remain underexplored. This work focuses on evaluating bounding box layout-guided models crafting a new prompt collection annotated with bounding boxes and a novel evaluation pipeline to measure layout accuracy of these models. Finally a comparison across all the models was performed. Resources, including prompts, code, and results, are publicly available.

Evaluation and comparison of layout-guided text-to-image diffusion models

VEZZARO, DAVIDE

2023/2024

Abstract

This study investigates text-to-image diffusion models, which generate images from textual descriptions. While promising for applications like art and creative content creation, these models face challenges, particularly with missing elements or incorrect attribute bindings when handling complex scenes with multiple elements and spatial relationships. To address these limitations, layout-guided models have emerged, allowing users to specify layouts (e.g., bounding boxes) alongside captions for better control. However, while much research has focused on text-to-image models, layout-guided models remain underexplored. This work focuses on evaluating bounding box layout-guided models crafting a new prompt collection annotated with bounding boxes and a novel evaluation pipeline to measure layout accuracy of these models. Finally a comparison across all the models was performed. Resources, including prompts, code, and results, are publicly available.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				COMPUTER SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Evaluation and comparison of layout-guided text-to-image diffusion models
			
	Abstract in italiano
	
				This study investigates text-to-image diffusion models, which generate images from textual descriptions. While promising for applications like art and creative content creation, these models face challenges, particularly with missing elements or incorrect attribute bindings when handling complex scenes with multiple elements and spatial relationships. To address these limitations, layout-guided models have emerged, allowing users to specify layouts (e.g., bounding boxes) alongside captions for better control. However, while much research has focused on text-to-image models, layout-guided models remain underexplored. This work focuses on evaluating bounding box layout-guided models crafting a new prompt collection annotated with bounding boxes and a novel evaluation pipeline to measure layout accuracy of these models. Finally a comparison across all the models was performed. Resources, including prompts, code, and results, are publicly available.
			
	Parola chiave
	
				Generative AI
Diffusion models
Text-to-image
Layout-guided
Image generation
			
	Relatore
	
				BALLAN, LAMBERTO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Vezzaro_Davide.pdf accesso aperto Dimensione 15.51 MB Formato Adobe PDF Visualizza/Apri	15.51 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/80214