Improving Zero-Shot Multi-Object Navigation by Enhancing Object Detection in LLM-Based Embodied AI Frameworks

This thesis explores advancements in object detection within the realm of zero-shot and open-set classification frameworks, with a specific focus on integrating object detectors into large language model (LLM)-based systems. The study begins by evaluating the MultiON framework for zero-shot object detection, leveraging its potential in recognizing novel objects without explicit training. Building on this foundation, the research transitions to incorporating YOLO (You Only Look Once) and DETR (DEtection TRansformer) models to enhance object detection capabilities, especially for open-set classification scenarios. A significant portion of the thesis is devoted to constructing a robust and domain-specific dataset tailored for detecting cylindrical objects. The dataset addresses the challenges of fine-grained detection, ensuring diversity and balance for effective model training. Extensive experiments are conducted to fine-tune YOLO and DETR models using this dataset, evaluating their performance across metrics such as precision, recall, and mean Average Precision (mAP). The findings highlight the strengths and limitations of current object detection frameworks in handling open-set classification tasks and provide actionable insights for their deployment in real-world applications. This work contributes to bridging the gap between object detection and LLM-based frameworks, paving the way for more generalized and adaptable AI systems.

Improving Zero-Shot Multi-Object Navigation by Enhancing Object Detection in LLM-Based Embodied AI Frameworks

RAJABI, MOHAMMADKAZEM

2024/2025

Abstract

This thesis explores advancements in object detection within the realm of zero-shot and open-set classification frameworks, with a specific focus on integrating object detectors into large language model (LLM)-based systems. The study begins by evaluating the MultiON framework for zero-shot object detection, leveraging its potential in recognizing novel objects without explicit training. Building on this foundation, the research transitions to incorporating YOLO (You Only Look Once) and DETR (DEtection TRansformer) models to enhance object detection capabilities, especially for open-set classification scenarios. A significant portion of the thesis is devoted to constructing a robust and domain-specific dataset tailored for detecting cylindrical objects. The dataset addresses the challenges of fine-grained detection, ensuring diversity and balance for effective model training. Extensive experiments are conducted to fine-tune YOLO and DETR models using this dataset, evaluating their performance across metrics such as precision, recall, and mean Average Precision (mAP). The findings highlight the strengths and limitations of current object detection frameworks in handling open-set classification tasks and provide actionable insights for their deployment in real-world applications. This work contributes to bridging the gap between object detection and LLM-based frameworks, paving the way for more generalized and adaptable AI systems.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				COMPUTER SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Improving Zero-Shot Multi-Object Navigation by Enhancing Object Detection in LLM-Based Embodied AI Frameworks
			
	Parola chiave
	
				MultiOn
Zero-Shot
LLM-Based Frameworks
			
	Relatore
	
				BALLAN, LAMBERTO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
FinalThesis_Tango.pdf accesso aperto Dimensione 20.74 MB Formato Adobe PDF Visualizza/Apri	20.74 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/91856