The current state of the art on recognizing actions as a computer vision problem focuses primarily on high quality video where the action is clearly visible. The models that are currently available are therefore not designed for low resolution inputs and their performance is not satisfactory in the presence of constraints such as resolution or duration of the video. This type of environment is very common in video surveillance, where we have low-resolution video captured with many of these constraints. In this work we are going to propose three different Multi-Scale architectures that try to adapt to these constraints and we will also propose some tricks that used in the training phase can significantly improve the performance of the models facing low resolution video. One of the proposed models, the FPN AD-ResNet50, manages to improve the ResNet18 baseline scores with an improvement of + 9.2% on F1-Score, + 9% on Precision and + 8.3% on Recall, using the low-resolution TinyVIRAT-v2 action recognition benchmark.

The current state of the art on recognizing actions as a computer vision problem focuses primarily on high quality video where the action is clearly visible. The models that are currently available are therefore not designed for low resolution inputs and their performance is not satisfactory in the presence of constraints such as resolution or duration of the video. This type of environment is very common in video surveillance, where we have low-resolution video captured with many of these constraints. In this work we are going to propose three different Multi-Scale architectures that try to adapt to these constraints and we will also propose some tricks that used in the training phase can significantly improve the performance of the models facing low resolution video. One of the proposed models, the FPN AD-ResNet50, manages to improve the ResNet18 baseline scores with an improvement of + 9.2% on F1-Score, + 9% on Precision and + 8.3% on Recall, using the low-resolution TinyVIRAT-v2 action recognition benchmark.

Action Recognition in Low-Resolution Videos

DAMETTO, ALEX
2021/2022

Abstract

The current state of the art on recognizing actions as a computer vision problem focuses primarily on high quality video where the action is clearly visible. The models that are currently available are therefore not designed for low resolution inputs and their performance is not satisfactory in the presence of constraints such as resolution or duration of the video. This type of environment is very common in video surveillance, where we have low-resolution video captured with many of these constraints. In this work we are going to propose three different Multi-Scale architectures that try to adapt to these constraints and we will also propose some tricks that used in the training phase can significantly improve the performance of the models facing low resolution video. One of the proposed models, the FPN AD-ResNet50, manages to improve the ResNet18 baseline scores with an improvement of + 9.2% on F1-Score, + 9% on Precision and + 8.3% on Recall, using the low-resolution TinyVIRAT-v2 action recognition benchmark.
2021
Action Recognition in Low-Resolution Videos
The current state of the art on recognizing actions as a computer vision problem focuses primarily on high quality video where the action is clearly visible. The models that are currently available are therefore not designed for low resolution inputs and their performance is not satisfactory in the presence of constraints such as resolution or duration of the video. This type of environment is very common in video surveillance, where we have low-resolution video captured with many of these constraints. In this work we are going to propose three different Multi-Scale architectures that try to adapt to these constraints and we will also propose some tricks that used in the training phase can significantly improve the performance of the models facing low resolution video. One of the proposed models, the FPN AD-ResNet50, manages to improve the ResNet18 baseline scores with an improvement of + 9.2% on F1-Score, + 9% on Precision and + 8.3% on Recall, using the low-resolution TinyVIRAT-v2 action recognition benchmark.
action recognition
computer vision
machine learning
video understanding
low-resolution
File in questo prodotto:
File Dimensione Formato  
Dametto_Alex.pdf

accesso aperto

Dimensione 3.86 MB
Formato Adobe PDF
3.86 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/32822