Ensembles are almost always preferable to a single model in real-world settings due to their enhanced reliability and scalability. Nonetheless, creating deep ensembles is not as straightforward as it may seem. In this work, I explore several previously successful ensembling techniques used with residual networks, applying them to various image transformer architectures, including Co-Scale Conv-Attentional Image Transformers, Pyramid Vision Transformer, Cross Image Transformer, and Swin Transformer. The results, along with the proposed creative approaches, are presented in detail.

Ensembles are almost always preferable to a single model in real-world settings due to their enhanced reliability and scalability. Nonetheless, creating deep ensembles is not as straightforward as it may seem. In this work, I explore several previously successful ensembling techniques used with residual networks, applying them to various image transformer architectures, including Co-Scale Conv-Attentional Image Transformers, Pyramid Vision Transformer, Cross Image Transformer, and Swin Transformer. The results, along with the proposed creative approaches, are presented in detail.

Enhancing Deep Ensembles for Image Classification

SHIEENAVAZ, TAHA
2024/2025

Abstract

Ensembles are almost always preferable to a single model in real-world settings due to their enhanced reliability and scalability. Nonetheless, creating deep ensembles is not as straightforward as it may seem. In this work, I explore several previously successful ensembling techniques used with residual networks, applying them to various image transformer architectures, including Co-Scale Conv-Attentional Image Transformers, Pyramid Vision Transformer, Cross Image Transformer, and Swin Transformer. The results, along with the proposed creative approaches, are presented in detail.
2024
Enhancing Deep Classifier Ensembles
Ensembles are almost always preferable to a single model in real-world settings due to their enhanced reliability and scalability. Nonetheless, creating deep ensembles is not as straightforward as it may seem. In this work, I explore several previously successful ensembling techniques used with residual networks, applying them to various image transformer architectures, including Co-Scale Conv-Attentional Image Transformers, Pyramid Vision Transformer, Cross Image Transformer, and Swin Transformer. The results, along with the proposed creative approaches, are presented in detail.
Deep Learning
Computer Vision
Resnet
Vision Transformers
File in questo prodotto:
File Dimensione Formato  
Shieenavaz_Taha.pdf

Accesso riservato

Dimensione 2.41 MB
Formato Adobe PDF
2.41 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/87535