Ensembles are almost always preferable to a single model in real-world settings due to their enhanced reliability and scalability. Nonetheless, creating deep ensembles is not as straightforward as it may seem. In this work, I explore several previously successful ensembling techniques used with residual networks, applying them to various image transformer architectures, including Co-Scale Conv-Attentional Image Transformers, Pyramid Vision Transformer, Cross Image Transformer, and Swin Transformer. The results, along with the proposed creative approaches, are presented in detail.
Ensembles are almost always preferable to a single model in real-world settings due to their enhanced reliability and scalability. Nonetheless, creating deep ensembles is not as straightforward as it may seem. In this work, I explore several previously successful ensembling techniques used with residual networks, applying them to various image transformer architectures, including Co-Scale Conv-Attentional Image Transformers, Pyramid Vision Transformer, Cross Image Transformer, and Swin Transformer. The results, along with the proposed creative approaches, are presented in detail.
Enhancing Deep Ensembles for Image Classification
SHIEENAVAZ, TAHA
2024/2025
Abstract
Ensembles are almost always preferable to a single model in real-world settings due to their enhanced reliability and scalability. Nonetheless, creating deep ensembles is not as straightforward as it may seem. In this work, I explore several previously successful ensembling techniques used with residual networks, applying them to various image transformer architectures, including Co-Scale Conv-Attentional Image Transformers, Pyramid Vision Transformer, Cross Image Transformer, and Swin Transformer. The results, along with the proposed creative approaches, are presented in detail.| File | Dimensione | Formato | |
|---|---|---|---|
|
Shieenavaz_Taha.pdf
Accesso riservato
Dimensione
2.41 MB
Formato
Adobe PDF
|
2.41 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/87535