Within the Artificial Intelligence framework, the Multi-Object Tracking problem lies with detecting targets from videos and reconstructing their trajectories in space, and it is commonly exploited for surveillance tasks. To provide a common and accepted benchmark for algorithms proposed by the research community, MOTChallenge was proposed. In this work, after a formalization of the main concepts underlying the MOT problem, namely how to properly define the problem and what metrics are involved, we study and select two of the State-Of-The-Art trackers according to such a benchmark: ByteTrack and FairMOT. Then, we modify ByteTrack to account for visual cues, in a fashion similar to FairMOT, training it on the annotated MOT17 dataset. Finally, with the network trained for the MOT20 competition, we perform the tracking of players during a football match, using as input the video recorded by a static camera placed in the center of the field. The authors also provided players' data coming from XYZ sensors worn by the home team. An algorithm is implemented to preprocess the video, correct the radial distortion, and project the tracklets from the image into pitch coordinates, finally assigning the detected players and their tracklets to the trajectories made available by the sensor. While the use of the re-identification feature does not seem to improve the tracker performance, our algorithm is found to be able to assign a tracklet, on average, to about the 60% of the trajectory of sensors.
Computer vision models for multi-object visual tracking: evaluations and real-world applications
NICOLAI, ANDREA
2021/2022
Abstract
Within the Artificial Intelligence framework, the Multi-Object Tracking problem lies with detecting targets from videos and reconstructing their trajectories in space, and it is commonly exploited for surveillance tasks. To provide a common and accepted benchmark for algorithms proposed by the research community, MOTChallenge was proposed. In this work, after a formalization of the main concepts underlying the MOT problem, namely how to properly define the problem and what metrics are involved, we study and select two of the State-Of-The-Art trackers according to such a benchmark: ByteTrack and FairMOT. Then, we modify ByteTrack to account for visual cues, in a fashion similar to FairMOT, training it on the annotated MOT17 dataset. Finally, with the network trained for the MOT20 competition, we perform the tracking of players during a football match, using as input the video recorded by a static camera placed in the center of the field. The authors also provided players' data coming from XYZ sensors worn by the home team. An algorithm is implemented to preprocess the video, correct the radial distortion, and project the tracklets from the image into pitch coordinates, finally assigning the detected players and their tracklets to the trajectories made available by the sensor. While the use of the re-identification feature does not seem to improve the tracker performance, our algorithm is found to be able to assign a tracklet, on average, to about the 60% of the trajectory of sensors.File | Dimensione | Formato | |
---|---|---|---|
Nicolai_Andrea.pdf
accesso aperto
Dimensione
25.74 MB
Formato
Adobe PDF
|
25.74 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/29385