Recent advances in automatic image colorization—driven by diffusion models, transformers, and GAN-based architectures—have brought unprecedented improvements in perceptual realism. Despite their success, such high-capacity networks are computationally demanding, which prevents their practical use in real-time or resource-limited scenarios. This thesis addresses the challenge of reproducing the vivid and semantically consistent results of these large generative models while avoiding their prohibitive runtime and memory requirements. To this end, we introduce TrustDistill, a modular and lightweight framework for perceptual image colorization based on disagreement-aware multi-teacher distillation. The framework transfers knowledge from a heterogeneous set of high-performing teacher models into a compact student network containing fewer than 12 million parameters. The resulting student runs in real time on CPUs (under 150 ms per image) without compromising visual quality. At the core of the method lies a novel distillation strategy that incorporates pixel-wise trust estimation through two complementary forms of uncertainty: epistemic (capturing the agreement among teachers) and aleatoric (measuring consistency between teacher predictions and ground truth). These trust scores govern a hybrid loss that adaptively balances teacher guidance with direct supervision, enabling the student to learn robustly even in challenging or ambiguous regions. Experiments on the Landscape Image Colorization Dataset show that the proposed student achieves quality on par with far larger teacher networks, reporting the lowest LPIPS score (0.1255) and the highest SSIM (0.9511) among evaluated models. In addition, the framework attains more than a tenfold speedup compared to transformer-based or instance-aware approaches, making it highly suitable for deployment on standard consumer hardware.
I recenti progressi nella colorizzazione automatica delle immagini—guidati da modelli di diffusione, transformer e architetture basate su GAN—hanno portato a miglioramenti senza precedenti nel realismo percettivo. Nonostante il loro successo, queste reti ad alta capacità risultano estremamente onerose dal punto di vista computazionale, il che ne limita l’uso pratico in scenari in tempo reale o con risorse limitate. Questa tesi affronta la sfida di riprodurre i risultati vividi e semanticamente coerenti di tali modelli generativi di grandi dimensioni, evitando al contempo i loro costi proibitivi in termini di tempo di esecuzione e memoria. A tal fine, presentiamo TrustDistill, un framework modulare e leggero per la colorizzazione percettiva delle immagini basato sulla distillazione multi-insegnante sensibile al disaccordo. Il framework trasferisce conoscenza da un insieme eterogeneo di modelli insegnanti ad alte prestazioni in una rete studente compatta, contenente meno di 12 milioni di parametri. Lo studente risultante opera in tempo reale su CPU (meno di 150 ms per immagine) senza compromettere la qualità visiva. Al centro del metodo vi è una nuova strategia di distillazione che integra la stima della fiducia a livello di pixel attraverso due forme complementari di incertezza: epistemica (che cattura l’accordo tra gli insegnanti) e aleatoria (che misura la coerenza tra le predizioni degli insegnanti e la verità a terra). Questi punteggi di fiducia governano una perdita ibrida che bilancia in modo adattivo la guida degli insegnanti con la supervisione diretta, consentendo allo studente di apprendere in modo robusto anche in regioni difficili o ambigue. Esperimenti sul Landscape Image Colorization Dataset dimostrano che lo studente proposto raggiunge una qualità paragonabile a quella di reti insegnanti molto più grandi, riportando il punteggio LPIPS più basso (0.1255) e l’SSIM più alto (0.9511) tra i modelli valutati. Inoltre, il framework ottiene un’accelerazione superiore a dieci volte rispetto ad approcci basati su transformer o su modelli sensibili alle istanze, rendendolo altamente adatto alla distribuzione su hardware consumer standard.
Multi-Teacher Trust-Aware Knowledge Distillation for Efficient Image Colorization
NASIRI, POOYA
2024/2025
Abstract
Recent advances in automatic image colorization—driven by diffusion models, transformers, and GAN-based architectures—have brought unprecedented improvements in perceptual realism. Despite their success, such high-capacity networks are computationally demanding, which prevents their practical use in real-time or resource-limited scenarios. This thesis addresses the challenge of reproducing the vivid and semantically consistent results of these large generative models while avoiding their prohibitive runtime and memory requirements. To this end, we introduce TrustDistill, a modular and lightweight framework for perceptual image colorization based on disagreement-aware multi-teacher distillation. The framework transfers knowledge from a heterogeneous set of high-performing teacher models into a compact student network containing fewer than 12 million parameters. The resulting student runs in real time on CPUs (under 150 ms per image) without compromising visual quality. At the core of the method lies a novel distillation strategy that incorporates pixel-wise trust estimation through two complementary forms of uncertainty: epistemic (capturing the agreement among teachers) and aleatoric (measuring consistency between teacher predictions and ground truth). These trust scores govern a hybrid loss that adaptively balances teacher guidance with direct supervision, enabling the student to learn robustly even in challenging or ambiguous regions. Experiments on the Landscape Image Colorization Dataset show that the proposed student achieves quality on par with far larger teacher networks, reporting the lowest LPIPS score (0.1255) and the highest SSIM (0.9511) among evaluated models. In addition, the framework attains more than a tenfold speedup compared to transformer-based or instance-aware approaches, making it highly suitable for deployment on standard consumer hardware.| File | Dimensione | Formato | |
|---|---|---|---|
|
PooyaNasiri_Thesis.pdf
Accesso riservato
Dimensione
6.75 MB
Formato
Adobe PDF
|
6.75 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/93338