The current state of the art on recognizing actions as a computer vision problem focuses primarily on high quality video where the action is clearly visible. The models that are currently available are therefore not designed for low resolution inputs and their performance is not satisfactory in the presence of constraints such as resolution or duration of the video. This type of environment is very common in video surveillance, where we have low-resolution video captured with many of these constraints. In this work we are going to propose three different Multi-Scale architectures that try to adapt to these constraints and we will also propose some tricks that used in the training phase can significantly improve the performance of the models facing low resolution video. One of the proposed models, the FPN AD-ResNet50, manages to improve the ResNet18 baseline scores with an improvement of + 9.2% on F1-Score, + 9% on Precision and + 8.3% on Recall, using the low-resolution TinyVIRAT-v2 action recognition benchmark.
The current state of the art on recognizing actions as a computer vision problem focuses primarily on high quality video where the action is clearly visible. The models that are currently available are therefore not designed for low resolution inputs and their performance is not satisfactory in the presence of constraints such as resolution or duration of the video. This type of environment is very common in video surveillance, where we have low-resolution video captured with many of these constraints. In this work we are going to propose three different Multi-Scale architectures that try to adapt to these constraints and we will also propose some tricks that used in the training phase can significantly improve the performance of the models facing low resolution video. One of the proposed models, the FPN AD-ResNet50, manages to improve the ResNet18 baseline scores with an improvement of + 9.2% on F1-Score, + 9% on Precision and + 8.3% on Recall, using the low-resolution TinyVIRAT-v2 action recognition benchmark.
Action Recognition in Low-Resolution Videos
DAMETTO, ALEX
2021/2022
Abstract
The current state of the art on recognizing actions as a computer vision problem focuses primarily on high quality video where the action is clearly visible. The models that are currently available are therefore not designed for low resolution inputs and their performance is not satisfactory in the presence of constraints such as resolution or duration of the video. This type of environment is very common in video surveillance, where we have low-resolution video captured with many of these constraints. In this work we are going to propose three different Multi-Scale architectures that try to adapt to these constraints and we will also propose some tricks that used in the training phase can significantly improve the performance of the models facing low resolution video. One of the proposed models, the FPN AD-ResNet50, manages to improve the ResNet18 baseline scores with an improvement of + 9.2% on F1-Score, + 9% on Precision and + 8.3% on Recall, using the low-resolution TinyVIRAT-v2 action recognition benchmark.File | Dimensione | Formato | |
---|---|---|---|
Dametto_Alex.pdf
accesso aperto
Dimensione
3.86 MB
Formato
Adobe PDF
|
3.86 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/32822