The focus of this project is to address the problem of Temporal Action Segmentation (TAS), which consist in temporally segment and classify fine-grained actions in untrimmed videos. The enhancement of this procedure represents a significant albeit intricate challenge. Some of the main challenges for this problem are that different actions can occur with different speed or duration, also some of them can be ambiguous and overlap. Successfully addressing this challenge can yield substantial advancements in various domains of work, including robotics, medical support technologies, surveillance and many more. Currently, the best performing state-of-the-art methods are fully-supervised. Consequently, they require huge annotation cost, are not scalable and not suited for applications where data collection is costly. To alleviate this problem, we propose a self-supervised transformer-based method for action segmentation, that does not require action labels, and demonstrate the effectiveness of the learned weights in a weakly-supervised setting. Precisely we built a Siamese architecture based on an improvement version of an already existing Transformer architecture. To validate our approach, we performed an ablation study and compared our results with the state-of-the-art to draw some conclusion.

Self-supervised learning for action segmentation using a Transformer architecture.

MINCATO, EMANUELE
2022/2023

Abstract

The focus of this project is to address the problem of Temporal Action Segmentation (TAS), which consist in temporally segment and classify fine-grained actions in untrimmed videos. The enhancement of this procedure represents a significant albeit intricate challenge. Some of the main challenges for this problem are that different actions can occur with different speed or duration, also some of them can be ambiguous and overlap. Successfully addressing this challenge can yield substantial advancements in various domains of work, including robotics, medical support technologies, surveillance and many more. Currently, the best performing state-of-the-art methods are fully-supervised. Consequently, they require huge annotation cost, are not scalable and not suited for applications where data collection is costly. To alleviate this problem, we propose a self-supervised transformer-based method for action segmentation, that does not require action labels, and demonstrate the effectiveness of the learned weights in a weakly-supervised setting. Precisely we built a Siamese architecture based on an improvement version of an already existing Transformer architecture. To validate our approach, we performed an ablation study and compared our results with the state-of-the-art to draw some conclusion.
2022
Self-supervised learning for action segmentation using a Transformer architecture.
Action segmentation
Frame classification
Transformer
Self-supervised
File in questo prodotto:
File Dimensione Formato  
Mincato_Emanuele.pdf

accesso aperto

Dimensione 5.23 MB
Formato Adobe PDF
5.23 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/52272