Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudo labels as supervision and using the learned representations for several downstream tasks. In Natural Language Processing brings remarkable results and all the State of the art in this domain achieve important benefits from it. It is not an alternative to traditional Supervised Learning or Unsupervised Learning, but it can help to achieve better generalization with less amount of human effort in building labelled datasets. This thesis aims at investigating the use of self-supervised learning in computer vision tasks by using spatial relations tasks between image patches. It will investigate the improvements in two different contexts; a convolutional neural network (ResNet50) used to solve Image classification tasks, called RelCNN and a transformer-based network, ViT, used for semantic segmentation purposes, named RelVit. In particular, one of the proposed models, RelVit, can outperform the standard ViT in all the experiments proved, but for what concerns the RelCNN model, only in a few situations does it outperform ResNet50, demonstrating that the use of self-supervised learning in the convolutional neural network needs more complicated solutions.
Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudo labels as supervision and using the learned representations for several downstream tasks. In Natural Language Processing brings remarkable results and all the State of the art in this domain achieve important benefits from it. It is not an alternative to traditional Supervised Learning or Unsupervised Learning, but it can help to achieve better generalization with less amount of human effort in building labelled datasets. This thesis aims at investigating the use of self-supervised learning in computer vision tasks by using spatial relations tasks between image patches. It will investigate the improvements in two different contexts; a convolutional neural network (ResNet50) used to solve Image classification tasks, called RelCNN and a transformer-based network, ViT, used for semantic segmentation purposes, named RelVit. In particular, one of the proposed models, RelVit, can outperform the standard ViT in all the experiments proved, but for what concerns the RelCNN model, only in a few situations does it outperform ResNet50, demonstrating that the use of self-supervised learning in the convolutional neural network needs more complicated solutions.
Exploiting patches spatial relations in self-supervised models for vision tasks
MELISSARI, LUCA
2021/2022
Abstract
Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudo labels as supervision and using the learned representations for several downstream tasks. In Natural Language Processing brings remarkable results and all the State of the art in this domain achieve important benefits from it. It is not an alternative to traditional Supervised Learning or Unsupervised Learning, but it can help to achieve better generalization with less amount of human effort in building labelled datasets. This thesis aims at investigating the use of self-supervised learning in computer vision tasks by using spatial relations tasks between image patches. It will investigate the improvements in two different contexts; a convolutional neural network (ResNet50) used to solve Image classification tasks, called RelCNN and a transformer-based network, ViT, used for semantic segmentation purposes, named RelVit. In particular, one of the proposed models, RelVit, can outperform the standard ViT in all the experiments proved, but for what concerns the RelCNN model, only in a few situations does it outperform ResNet50, demonstrating that the use of self-supervised learning in the convolutional neural network needs more complicated solutions.File | Dimensione | Formato | |
---|---|---|---|
Melissari_Luca.pdf
accesso riservato
Dimensione
6.66 MB
Formato
Adobe PDF
|
6.66 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/34962