This thesis explores advancements in object detection within the realm of zero-shot and open-set classification frameworks, with a specific focus on integrating object detectors into large language model (LLM)-based systems. The study begins by evaluating the MultiON framework for zero-shot object detection, leveraging its potential in recognizing novel objects without explicit training. Building on this foundation, the research transitions to incorporating YOLO (You Only Look Once) and DETR (DEtection TRansformer) models to enhance object detection capabilities, especially for open-set classification scenarios. A significant portion of the thesis is devoted to constructing a robust and domain-specific dataset tailored for detecting cylindrical objects. The dataset addresses the challenges of fine-grained detection, ensuring diversity and balance for effective model training. Extensive experiments are conducted to fine-tune YOLO and DETR models using this dataset, evaluating their performance across metrics such as precision, recall, and mean Average Precision (mAP). The findings highlight the strengths and limitations of current object detection frameworks in handling open-set classification tasks and provide actionable insights for their deployment in real-world applications. This work contributes to bridging the gap between object detection and LLM-based frameworks, paving the way for more generalized and adaptable AI systems.

Improving Zero-Shot Multi-Object Navigation by Enhancing Object Detection in LLM-Based Embodied AI Frameworks

RAJABI, MOHAMMADKAZEM
2024/2025

Abstract

This thesis explores advancements in object detection within the realm of zero-shot and open-set classification frameworks, with a specific focus on integrating object detectors into large language model (LLM)-based systems. The study begins by evaluating the MultiON framework for zero-shot object detection, leveraging its potential in recognizing novel objects without explicit training. Building on this foundation, the research transitions to incorporating YOLO (You Only Look Once) and DETR (DEtection TRansformer) models to enhance object detection capabilities, especially for open-set classification scenarios. A significant portion of the thesis is devoted to constructing a robust and domain-specific dataset tailored for detecting cylindrical objects. The dataset addresses the challenges of fine-grained detection, ensuring diversity and balance for effective model training. Extensive experiments are conducted to fine-tune YOLO and DETR models using this dataset, evaluating their performance across metrics such as precision, recall, and mean Average Precision (mAP). The findings highlight the strengths and limitations of current object detection frameworks in handling open-set classification tasks and provide actionable insights for their deployment in real-world applications. This work contributes to bridging the gap between object detection and LLM-based frameworks, paving the way for more generalized and adaptable AI systems.
2024
Improving Zero-Shot Multi-Object Navigation by Enhancing Object Detection in LLM-Based Embodied AI Frameworks
MultiOn
Zero-Shot
LLM-Based Frameworks
File in questo prodotto:
File Dimensione Formato  
FinalThesis_Tango.pdf

accesso aperto

Dimensione 20.74 MB
Formato Adobe PDF
20.74 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/91856