Within the Artificial Intelligence framework, the Multi-Object Tracking problem lies with detecting targets from videos and reconstructing their trajectories in space, and it is commonly exploited for surveillance tasks. To provide a common and accepted benchmark for algorithms proposed by the research community, MOTChallenge was proposed. In this work, after a formalization of the main concepts underlying the MOT problem, namely how to properly define the problem and what metrics are involved, we study and select two of the State-Of-The-Art trackers according to such a benchmark: ByteTrack and FairMOT. Then, we modify ByteTrack to account for visual cues, in a fashion similar to FairMOT, training it on the annotated MOT17 dataset. Finally, with the network trained for the MOT20 competition, we perform the tracking of players during a football match, using as input the video recorded by a static camera placed in the center of the field. The authors also provided players' data coming from XYZ sensors worn by the home team. An algorithm is implemented to preprocess the video, correct the radial distortion, and project the tracklets from the image into pitch coordinates, finally assigning the detected players and their tracklets to the trajectories made available by the sensor. While the use of the re-identification feature does not seem to improve the tracker performance, our algorithm is found to be able to assign a tracklet, on average, to about the 60% of the trajectory of sensors.

Computer vision models for multi-object visual tracking: evaluations and real-world applications

NICOLAI, ANDREA
2021/2022

Abstract

Within the Artificial Intelligence framework, the Multi-Object Tracking problem lies with detecting targets from videos and reconstructing their trajectories in space, and it is commonly exploited for surveillance tasks. To provide a common and accepted benchmark for algorithms proposed by the research community, MOTChallenge was proposed. In this work, after a formalization of the main concepts underlying the MOT problem, namely how to properly define the problem and what metrics are involved, we study and select two of the State-Of-The-Art trackers according to such a benchmark: ByteTrack and FairMOT. Then, we modify ByteTrack to account for visual cues, in a fashion similar to FairMOT, training it on the annotated MOT17 dataset. Finally, with the network trained for the MOT20 competition, we perform the tracking of players during a football match, using as input the video recorded by a static camera placed in the center of the field. The authors also provided players' data coming from XYZ sensors worn by the home team. An algorithm is implemented to preprocess the video, correct the radial distortion, and project the tracklets from the image into pitch coordinates, finally assigning the detected players and their tracklets to the trajectories made available by the sensor. While the use of the re-identification feature does not seem to improve the tracker performance, our algorithm is found to be able to assign a tracklet, on average, to about the 60% of the trajectory of sensors.
2021
Computer vision models for multi-object visual tracking: evaluations and real-world applications
Computer Vision
Object Tracking
Machine Learning
Visual Tracking
File in questo prodotto:
File Dimensione Formato  
Nicolai_Andrea.pdf

accesso aperto

Dimensione 25.74 MB
Formato Adobe PDF
25.74 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/29385