This thesis explores advancements in object detection within the realm of zero-shot and open-set classification frameworks, with a specific focus on integrating object detectors into large language model (LLM)-based systems. The study begins by evaluating the MultiON framework for zero-shot object detection, leveraging its potential in recognizing novel objects without explicit training. Building on this foundation, the research transitions to incorporating YOLO (You Only Look Once) and DETR (DEtection TRansformer) models to enhance object detection capabilities, especially for open-set classification scenarios. A significant portion of the thesis is devoted to constructing a robust and domain-specific dataset tailored for detecting cylindrical objects. The dataset addresses the challenges of fine-grained detection, ensuring diversity and balance for effective model training. Extensive experiments are conducted to fine-tune YOLO and DETR models using this dataset, evaluating their performance across metrics such as precision, recall, and mean Average Precision (mAP). The findings highlight the strengths and limitations of current object detection frameworks in handling open-set classification tasks and provide actionable insights for their deployment in real-world applications. This work contributes to bridging the gap between object detection and LLM-based frameworks, paving the way for more generalized and adaptable AI systems.
Improving Zero-Shot Multi-Object Navigation by Enhancing Object Detection in LLM-Based Embodied AI Frameworks
RAJABI, MOHAMMADKAZEM
2024/2025
Abstract
This thesis explores advancements in object detection within the realm of zero-shot and open-set classification frameworks, with a specific focus on integrating object detectors into large language model (LLM)-based systems. The study begins by evaluating the MultiON framework for zero-shot object detection, leveraging its potential in recognizing novel objects without explicit training. Building on this foundation, the research transitions to incorporating YOLO (You Only Look Once) and DETR (DEtection TRansformer) models to enhance object detection capabilities, especially for open-set classification scenarios. A significant portion of the thesis is devoted to constructing a robust and domain-specific dataset tailored for detecting cylindrical objects. The dataset addresses the challenges of fine-grained detection, ensuring diversity and balance for effective model training. Extensive experiments are conducted to fine-tune YOLO and DETR models using this dataset, evaluating their performance across metrics such as precision, recall, and mean Average Precision (mAP). The findings highlight the strengths and limitations of current object detection frameworks in handling open-set classification tasks and provide actionable insights for their deployment in real-world applications. This work contributes to bridging the gap between object detection and LLM-based frameworks, paving the way for more generalized and adaptable AI systems.| File | Dimensione | Formato | |
|---|---|---|---|
|
FinalThesis_Tango.pdf
accesso aperto
Dimensione
20.74 MB
Formato
Adobe PDF
|
20.74 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/91856