Human Action Recognition (HAR) is a relevant research area of computer vision that deals with the recognition and classification of the human actions from data captured by sensors such as cameras or wearable sensors. With the increasing interest in human-robot interaction and collaboration scenarios, it is important for the robot to understand and recognize the action performed by the human in real-time (online recognition), so as to behave accordingly. Robustness in data representation is another important requirement in these contexts. Skeletal data are a simple and informative way of representing human actions, particularly well-suited for online action recognition due to their robustness. The purpose of this thesis is the study and development of models that perform human action recognition in real-time, to be applied in scenarios of human-robot collaboration. An in-depth study was conducted on two existing state-of-the-art models, namely InfoGCN++ and STGCN-SWVM, that perform online HAR on skeletal data exploiting Graph Convolutional Networks (GCNs). The generalization ability of the models is studied by splitting the dataset based on the subjects that perform the actions, according to the Leave-One-Out Cross-Validation approach. Furthermore, motivated by the prevalence of complex models in the literature, two small-size simple models, namely a Convolutional and a LSTM model, are developed in order to speed up the recognition of the actions with more straightforward architectures, exploiting the so called ensemble learning strategy. This is achieved by creating a series of classifiers working in parallel, each responsible for recognizing one single class of the dataset, following the One-vs-All approach for the multi-class classification. The outputs of all the classifiers are combined to select the final predictions. Convolutional models demonstrated great ability in online recognition, outperforming the two-state-of-the-art architectures and showing greater ability of generalizing if actions performed by different subjects are considered. On the other hand, LSTM models showed some difficulties in online action recognition. Actions are recognized by models, generally, with some delay and inaccuracies, due to the need of time in identifying the action type and potential noise in data.
Human Action Recognition (HAR) is a relevant research area of computer vision that deals with the recognition and classification of the human actions from data captured by sensors such as cameras or wearable sensors. With the increasing interest in human-robot interaction and collaboration scenarios, it is important for the robot to understand and recognize the action performed by the human in real-time (online recognition), so as to behave accordingly. Robustness in data representation is another important requirement in these contexts. Skeletal data are a simple and informative way of representing human actions, particularly well-suited for online action recognition due to their robustness. The purpose of this thesis is the study and development of models that perform human action recognition in real-time, to be applied in scenarios of human-robot collaboration. An in-depth study was conducted on two existing state-of-the-art models, namely InfoGCN++ and STGCN-SWVM, that perform online HAR on skeletal data exploiting Graph Convolutional Networks (GCNs). The generalization ability of the models is studied by splitting the dataset based on the subjects that perform the actions, according to the Leave-One-Out Cross-Validation approach. Furthermore, motivated by the prevalence of complex models in the literature, two small-size simple models, namely a Convolutional and a LSTM model, are developed in order to speed up the recognition of the actions with more straightforward architectures, exploiting the so called ensemble learning strategy. This is achieved by creating a series of classifiers working in parallel, each responsible for recognizing one single class of the dataset, following the One-vs-All approach for the multi-class classification. The outputs of all the classifiers are combined to select the final predictions. Convolutional models demonstrated great ability in online recognition, outperforming the two-state-of-the-art architectures and showing greater ability of generalizing if actions performed by different subjects are considered. On the other hand, LSTM models showed some difficulties in online action recognition. Actions are recognized by models, generally, with some delay and inaccuracies, due to the need of time in identifying the action type and potential noise in data.
Online human action recognition based on skeletal data for human-robot collaboration scenarios
MARCHESINI, ANNA
2023/2024
Abstract
Human Action Recognition (HAR) is a relevant research area of computer vision that deals with the recognition and classification of the human actions from data captured by sensors such as cameras or wearable sensors. With the increasing interest in human-robot interaction and collaboration scenarios, it is important for the robot to understand and recognize the action performed by the human in real-time (online recognition), so as to behave accordingly. Robustness in data representation is another important requirement in these contexts. Skeletal data are a simple and informative way of representing human actions, particularly well-suited for online action recognition due to their robustness. The purpose of this thesis is the study and development of models that perform human action recognition in real-time, to be applied in scenarios of human-robot collaboration. An in-depth study was conducted on two existing state-of-the-art models, namely InfoGCN++ and STGCN-SWVM, that perform online HAR on skeletal data exploiting Graph Convolutional Networks (GCNs). The generalization ability of the models is studied by splitting the dataset based on the subjects that perform the actions, according to the Leave-One-Out Cross-Validation approach. Furthermore, motivated by the prevalence of complex models in the literature, two small-size simple models, namely a Convolutional and a LSTM model, are developed in order to speed up the recognition of the actions with more straightforward architectures, exploiting the so called ensemble learning strategy. This is achieved by creating a series of classifiers working in parallel, each responsible for recognizing one single class of the dataset, following the One-vs-All approach for the multi-class classification. The outputs of all the classifiers are combined to select the final predictions. Convolutional models demonstrated great ability in online recognition, outperforming the two-state-of-the-art architectures and showing greater ability of generalizing if actions performed by different subjects are considered. On the other hand, LSTM models showed some difficulties in online action recognition. Actions are recognized by models, generally, with some delay and inaccuracies, due to the need of time in identifying the action type and potential noise in data.File | Dimensione | Formato | |
---|---|---|---|
Marchesini_Anna.pdf
accesso riservato
Dimensione
7.95 MB
Formato
Adobe PDF
|
7.95 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/80168