Motion estimation in computer vision (CV) presents significant challenges, particularly in tracking points across occlusions. The Omnimotion model, introduced in the paper "Tracking Everything Everywhere All at Once," addresses this issue by mapping video points from local frames into a global 3D representation, allowing for accurate tracking. This model leverages Invertible Neural Networks (INNs) and Neural Radiance Fields (NeRF) to maintain visibility across frames, but its computationally intensive structure and reliance on RAFT for pre-processing demand significant computational resources and lengthy training times. This study proposes several optimizations to reduce these computational dependency. By eliminating non-essential loss functions and introducing an embedding mechanism for internal MLP layers, training complexity and time were significantly reduced. Additionally, the implementation of Tiny-CUDA and weight freezing techniques further enhanced performance, cutting memory usage by 50% and improving training speed by nearly twofold. Although a slight decrease in model quality was observed, these optimizations expand the potential use of Omnimotion for larger datasets and real-time applications.

Motion estimation in computer vision (CV) presents significant challenges, particularly in tracking points across occlusions. The Omnimotion model, introduced in the paper "Tracking Everything Everywhere All at Once," addresses this issue by mapping video points from local frames into a global 3D representation, allowing for accurate tracking. This model leverages Invertible Neural Networks (INNs) and Neural Radiance Fields (NeRF) to maintain visibility across frames, but its computationally intensive structure and reliance on RAFT for pre-processing demand significant computational resources and lengthy training times. This study proposes several optimizations to reduce these computational dependency. By eliminating non-essential loss functions and introducing an embedding mechanism for internal MLP layers, training complexity and time were significantly reduced. Additionally, the implementation of Tiny-CUDA and weight freezing techniques further enhanced performance, cutting memory usage by 50% and improving training speed by nearly twofold. Although a slight decrease in model quality was observed, these optimizations expand the potential use of Omnimotion for larger datasets and real-time applications.

Optimizing Omnimotion: Enhancing Efficiency and Speed in Dense Full-Pixel Tracking

TLEPIN, SANJAR
2023/2024

Abstract

Motion estimation in computer vision (CV) presents significant challenges, particularly in tracking points across occlusions. The Omnimotion model, introduced in the paper "Tracking Everything Everywhere All at Once," addresses this issue by mapping video points from local frames into a global 3D representation, allowing for accurate tracking. This model leverages Invertible Neural Networks (INNs) and Neural Radiance Fields (NeRF) to maintain visibility across frames, but its computationally intensive structure and reliance on RAFT for pre-processing demand significant computational resources and lengthy training times. This study proposes several optimizations to reduce these computational dependency. By eliminating non-essential loss functions and introducing an embedding mechanism for internal MLP layers, training complexity and time were significantly reduced. Additionally, the implementation of Tiny-CUDA and weight freezing techniques further enhanced performance, cutting memory usage by 50% and improving training speed by nearly twofold. Although a slight decrease in model quality was observed, these optimizations expand the potential use of Omnimotion for larger datasets and real-time applications.
2023
Optimizing Omnimotion: Enhancing Efficiency and Speed in Dense Full-Pixel Tracking
Motion estimation in computer vision (CV) presents significant challenges, particularly in tracking points across occlusions. The Omnimotion model, introduced in the paper "Tracking Everything Everywhere All at Once," addresses this issue by mapping video points from local frames into a global 3D representation, allowing for accurate tracking. This model leverages Invertible Neural Networks (INNs) and Neural Radiance Fields (NeRF) to maintain visibility across frames, but its computationally intensive structure and reliance on RAFT for pre-processing demand significant computational resources and lengthy training times. This study proposes several optimizations to reduce these computational dependency. By eliminating non-essential loss functions and introducing an embedding mechanism for internal MLP layers, training complexity and time were significantly reduced. Additionally, the implementation of Tiny-CUDA and weight freezing techniques further enhanced performance, cutting memory usage by 50% and improving training speed by nearly twofold. Although a slight decrease in model quality was observed, these optimizations expand the potential use of Omnimotion for larger datasets and real-time applications.
Computer vision
Dense optical flow
Visual tracking
File in questo prodotto:
File Dimensione Formato  
MasterThesisTlepin.pdf

accesso riservato

Dimensione 14.52 MB
Formato Adobe PDF
14.52 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/71089