Accurate and scalable annotation workflows are critical for training high-performance segmentation models, especially in domains like waste recycling where manual labeling is labor-intensive. This thesis presents a hybrid framework that automates segmentation annotation by combining semantic segmentation ensembles with instance-level refinement using the Segment Anything Model (SAM). Three state-of-the-art segmentation models---UperNet, OCRNet, and DeepLabv3+---are fused through a suite of ensemble strategies, including Majority Voting (MV), Logit-Based Fusion (LBF), and the proposed Signed Distance Function-Based Mask Retrieval (SDF-MR), which leverages signed distance functions for geometrically consistent mask fusion. These strategies incorporate model-wise and class-wise weighting schemes to improve flexibility and robustness. To further refine boundary precision and resolve complex object ambiguities, the framework integrates two adaptive refinement strategies based on SAM: Precomputed Mask-Based Refinement (PMR) and Connected Component-Based Refinement (CCR). Experimental results demonstrate consistent improvements in segmentation quality, particularly for underrepresented classes, highlighting the framework’s potential for offline, automatic labeling in data-constrained environments. The core contribution of this thesis is a modular, end-to-end Python-based framework that automates ensemble fusion, SAM-powered refinement, and internal evaluation. The system supports post-hoc analysis using metrics such as mIoU and pixel accuracy, enabling annotators to assess the effectiveness of each strategy for their domain. While tailored to waste recycling, the framework’s lightweight and extensible design makes it suitable for adaptation to other structured image domains with minimal retraining. By bridging ensemble learning with foundation model-based refinement, this work delivers a practical, scalable solution for generating high-quality segmentation annotations, particularly in data- and resource-constrained environments.

Workflow di annotazione accurati e scalabili sono fondamentali per addestrare modelli di segmentazione high-performance, specialmente in domini come il waste recycling dove la labelizzazione manuale è laboriosa. Questa tesi presenta un framework ibrido che automatizza l'annotazione di segmentazione combinando ensemble di segmentazione semantica con raffinamento a livello di istanza utilizzando il Segment Anything Model (SAM). Tre modelli di segmentazione state-of-the-art - UperNet, OCRNet e DeepLabv3+ - vengono fusi attraverso una serie di strategie di ensemble, tra cui Majority Voting (MV), Logit-Based Fusion (LBF), e la proposta Signed Distance Function-Based Mask Retrieval (SDF-MR), che sfrutta signed distance functions per una fusione geometricamente coerente delle maschere. Queste strategie incorporano model-wise e class-wise weighting schemes per migliorare flessibilità e robustezza. Per perfezionare ulteriormente la precisione dei bordi e risolvere ambiguità in oggetti complessi, il framework integra due strategie adattive di raffinamento basate su SAM: Precomputed Mask-Based Refinement (PMR) e Connected Component-Based Refinement (CCR). I risultati sperimentali dimostrano miglioramenti consistenti nella qualità della segmentazione, particolarmente per classi underrepresented, evidenziando il potenziale del framework per l'etichettatura automatica offline in ambienti data-constrained. Il contributo principale di questa tesi è un framework modulare ed end-to-end basato su Python che automatizza ensemble fusion, SAM-powered refinement e internal evaluation. Il sistema supporta post-hoc analysis utilizzando metriche come mIoU e pixel accuracy, consentendo agli annotatori di valutare l'efficacia di ciascuna strategia per il proprio dominio. Sebbene progettato per il waste recycling, il design lightweight ed estensibile del framework lo rende adattabile ad altri domini strutturati di immagini con minimal retraining. Collegando ensemble learning con refinement basato su foundation model, questo lavoro fornisce una soluzione pratica e scalabile per generare annotazioni di segmentazione high-quality, particolarmente in ambienti data- e resource-constrained.

Efficient Auto-Labeling for Enhanced Semantic Segmentation Annotation: An Ensemble-Based Framework with SAM Refinement

AKBARI MOAFI, MOHAMMADHOSSEIN
2024/2025

Abstract

Accurate and scalable annotation workflows are critical for training high-performance segmentation models, especially in domains like waste recycling where manual labeling is labor-intensive. This thesis presents a hybrid framework that automates segmentation annotation by combining semantic segmentation ensembles with instance-level refinement using the Segment Anything Model (SAM). Three state-of-the-art segmentation models---UperNet, OCRNet, and DeepLabv3+---are fused through a suite of ensemble strategies, including Majority Voting (MV), Logit-Based Fusion (LBF), and the proposed Signed Distance Function-Based Mask Retrieval (SDF-MR), which leverages signed distance functions for geometrically consistent mask fusion. These strategies incorporate model-wise and class-wise weighting schemes to improve flexibility and robustness. To further refine boundary precision and resolve complex object ambiguities, the framework integrates two adaptive refinement strategies based on SAM: Precomputed Mask-Based Refinement (PMR) and Connected Component-Based Refinement (CCR). Experimental results demonstrate consistent improvements in segmentation quality, particularly for underrepresented classes, highlighting the framework’s potential for offline, automatic labeling in data-constrained environments. The core contribution of this thesis is a modular, end-to-end Python-based framework that automates ensemble fusion, SAM-powered refinement, and internal evaluation. The system supports post-hoc analysis using metrics such as mIoU and pixel accuracy, enabling annotators to assess the effectiveness of each strategy for their domain. While tailored to waste recycling, the framework’s lightweight and extensible design makes it suitable for adaptation to other structured image domains with minimal retraining. By bridging ensemble learning with foundation model-based refinement, this work delivers a practical, scalable solution for generating high-quality segmentation annotations, particularly in data- and resource-constrained environments.
2024
Efficient Auto-Labeling for Enhanced Semantic Segmentation Annotation: An Ensemble-Based Framework with SAM Refinement
Workflow di annotazione accurati e scalabili sono fondamentali per addestrare modelli di segmentazione high-performance, specialmente in domini come il waste recycling dove la labelizzazione manuale è laboriosa. Questa tesi presenta un framework ibrido che automatizza l'annotazione di segmentazione combinando ensemble di segmentazione semantica con raffinamento a livello di istanza utilizzando il Segment Anything Model (SAM). Tre modelli di segmentazione state-of-the-art - UperNet, OCRNet e DeepLabv3+ - vengono fusi attraverso una serie di strategie di ensemble, tra cui Majority Voting (MV), Logit-Based Fusion (LBF), e la proposta Signed Distance Function-Based Mask Retrieval (SDF-MR), che sfrutta signed distance functions per una fusione geometricamente coerente delle maschere. Queste strategie incorporano model-wise e class-wise weighting schemes per migliorare flessibilità e robustezza. Per perfezionare ulteriormente la precisione dei bordi e risolvere ambiguità in oggetti complessi, il framework integra due strategie adattive di raffinamento basate su SAM: Precomputed Mask-Based Refinement (PMR) e Connected Component-Based Refinement (CCR). I risultati sperimentali dimostrano miglioramenti consistenti nella qualità della segmentazione, particolarmente per classi underrepresented, evidenziando il potenziale del framework per l'etichettatura automatica offline in ambienti data-constrained. Il contributo principale di questa tesi è un framework modulare ed end-to-end basato su Python che automatizza ensemble fusion, SAM-powered refinement e internal evaluation. Il sistema supporta post-hoc analysis utilizzando metriche come mIoU e pixel accuracy, consentendo agli annotatori di valutare l'efficacia di ciascuna strategia per il proprio dominio. Sebbene progettato per il waste recycling, il design lightweight ed estensibile del framework lo rende adattabile ad altri domini strutturati di immagini con minimal retraining. Collegando ensemble learning con refinement basato su foundation model, questo lavoro fornisce una soluzione pratica e scalabile per generare annotazioni di segmentazione high-quality, particolarmente in ambienti data- e resource-constrained.
Segmentation
Ensemble Learning
Auto-Annotation
Waste Recycling
Hybrid Framework
File in questo prodotto:
File Dimensione Formato  
Akbarimoafi_Mohammadhossein.pdf

Accesso riservato

Dimensione 9.02 MB
Formato Adobe PDF
9.02 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/86957