Vision-Language Models (VLMs) have recently emerged as powerful deep learning frameworks capable of generalizing to novel scenarios and tasks in zero-shot or few-shot settings. Despite their impressive performance in benchmark datasets, their practical applicability and reliability in real-world environments remain uncertain.This study investigates the generalization capabilities of VLMs in the context of a complex real-world task: automated waste sorting. After reviewing the most prominent open-source VLMs available in the current scientific literature, we evaluate their performance in detecting contaminants within waste images collected from operational sorting plants. Several models are compared to identify their strengths and limitations, offering insights into their suitability for deployment in real-world industrial applications.
I Vision-Language Models (VLM) sono recentemente emersi come potenti framework di deep learning in grado di generalizzare scenari e compiti nuovi in contesti zero-shot o few-shot. Nonostante le loro prestazioni impressionanti nei dataset di benchmark, la loro applicabilità pratica e affidabilità in ambienti reali rimangono incerte. Questo studio indaga le capacità di generalizzazione dei VLM nel contesto di un compito complesso del mondo reale: lo smistamento automatico dei rifiuti. Dopo aver esaminato i più importanti VLM open source disponibili nella letteratura scientifica attuale, valutiamo le loro prestazioni nel rilevare i contaminanti all'interno delle immagini dei rifiuti raccolte dagli impianti di smistamento operativi. Diversi modelli vengono confrontati per identificarne i punti di forza e i limiti, offrendo approfondimenti sulla loro idoneità all'implementazione in applicazioni industriali reali.
Vision-Language Models for Waste Sorting: a comparative study
BERTAN, KABIR
2024/2025
Abstract
Vision-Language Models (VLMs) have recently emerged as powerful deep learning frameworks capable of generalizing to novel scenarios and tasks in zero-shot or few-shot settings. Despite their impressive performance in benchmark datasets, their practical applicability and reliability in real-world environments remain uncertain.This study investigates the generalization capabilities of VLMs in the context of a complex real-world task: automated waste sorting. After reviewing the most prominent open-source VLMs available in the current scientific literature, we evaluate their performance in detecting contaminants within waste images collected from operational sorting plants. Several models are compared to identify their strengths and limitations, offering insights into their suitability for deployment in real-world industrial applications.| File | Dimensione | Formato | |
|---|---|---|---|
|
Bertan_Kabir.pdf
accesso aperto
Dimensione
21.91 MB
Formato
Adobe PDF
|
21.91 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/93664