An Empirical Study on Segmentation Methods with Deep Ensembles and Data Augmentation

In the past few years, there has been a growing focus on semantic segmentation, which involves assigning each pixel in an image to a specific label from a given set[65].The use of autoencoder architectures has been explored by numerous computer vision researchers in an attempt to develop models capable of learning both the semantics of an image and a low-level representation of it. When utilizing an autoencoder architecture, the input undergoes encoding to produce a low-dimensional representation. This representation is subsequently leveraged by a decoder to reconstruct the original data. The presented ap- proach involves a combination of convolutional neural networks (CNNs) and transformers to form an ensemble, as detailed in this work. Ensemble methods rely on multiple models being trained and utilized for classification, with the ensemble combining the outputs of individual classifiers. By capitalizing on the varying strengths of each classifier, this approach enhances the overall performance of the system. Distinct loss functions are employed to ensure diversity among the individual networks. The ensemble method em- ploys a combination of the DeepLabV3+, HarDNet, and PVT environments, with varying backbone networks. Additionally, a novel loss function is presented, which integrates the Dice and Structural Similarity Index. To assess the proposed ensemble, a comprehensive empirical evaluation is conducted on six real-world scenarios, namely polyp, skin segmen- tation, leukocyte segmentation, butterfly identification, microorganism identification, and radiology segmentation. The proposed model has achieved state-of-the-art performance on these scenarios.

An Empirical Study on Segmentation Methods with Deep Ensembles and Data Augmentation

CUZA, DANIELA

2022/2023

Abstract

In the past few years, there has been a growing focus on semantic segmentation, which involves assigning each pixel in an image to a specific label from a given set[65].The use of autoencoder architectures has been explored by numerous computer vision researchers in an attempt to develop models capable of learning both the semantics of an image and a low-level representation of it. When utilizing an autoencoder architecture, the input undergoes encoding to produce a low-dimensional representation. This representation is subsequently leveraged by a decoder to reconstruct the original data. The presented ap- proach involves a combination of convolutional neural networks (CNNs) and transformers to form an ensemble, as detailed in this work. Ensemble methods rely on multiple models being trained and utilized for classification, with the ensemble combining the outputs of individual classifiers. By capitalizing on the varying strengths of each classifier, this approach enhances the overall performance of the system. Distinct loss functions are employed to ensure diversity among the individual networks. The ensemble method em- ploys a combination of the DeepLabV3+, HarDNet, and PVT environments, with varying backbone networks. Additionally, a novel loss function is presented, which integrates the Dice and Structural Similarity Index. To assess the proposed ensemble, a comprehensive empirical evaluation is conducted on six real-world scenarios, namely polyp, skin segmen- tation, leukocyte segmentation, butterfly identification, microorganism identification, and radiology segmentation. The proposed model has achieved state-of-the-art performance on these scenarios.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2022
			
	Titolo inglese
	
				An Empirical Study on Segmentation Methods with Deep Ensembles and Data Augmentation
			
	Abstract in italiano
	
				In the past few years, there has been a growing focus on semantic segmentation, which
involves assigning each pixel in an image to a specific label from a given set[65].The use
of autoencoder architectures has been explored by numerous computer vision researchers
in an attempt to develop models capable of learning both the semantics of an image and
a low-level representation of it. When utilizing an autoencoder architecture, the input
undergoes encoding to produce a low-dimensional representation. This representation is
subsequently leveraged by a decoder to reconstruct the original data. The presented ap-
proach involves a combination of convolutional neural networks (CNNs) and transformers
to form an ensemble, as detailed in this work. Ensemble methods rely on multiple models
being trained and utilized for classification, with the ensemble combining the outputs
of individual classifiers. By capitalizing on the varying strengths of each classifier, this
approach enhances the overall performance of the system. Distinct loss functions are
employed to ensure diversity among the individual networks. The ensemble method em-
ploys a combination of the DeepLabV3+, HarDNet, and PVT environments, with varying
backbone networks. Additionally, a novel loss function is presented, which integrates the
Dice and Structural Similarity Index. To assess the proposed ensemble, a comprehensive
empirical evaluation is conducted on six real-world scenarios, namely polyp, skin segmen-
tation, leukocyte segmentation, butterfly identification, microorganism identification, and
radiology segmentation. The proposed model has achieved state-of-the-art performance
on these scenarios.
			
	Parola chiave
	
				segmentation
ensembles
deep learning
data augmentation
loss function
			
	Relatore
	
				NANNI, LORIS
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Cuza_Daniela.pdf accesso aperto Dimensione 4.65 MB Formato Adobe PDF Visualizza/Apri	4.65 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/50909