Reconstructing 3D poses from a single RGB image is a challenging task. Such computer vision problem hides an inherent ambiguity in the determination of the depth coordinate of the keypoints. In the following work I will start from exploring state-of-the-art approaches used to solve it focusing specifically on Human Hands Pose Estimation I will consider the most natural settings, including self-interaction and interaction with objects. Expressing the groundtruth hand label coordinates in the reference frame centered in a standard point or in a point internal to the hand, plays an important role to the success of the training process. After evaluating the benefits of chosing a specific one, I propose a multi-stage approach separately regressing root-relative pose and root coordinates in the camera reference frame. Such model is then trained and tested on the novel dataset: H2O dataset (2 Hands and Objects)
Reconstructing 3D poses from a single RGB image is a challenging task. Such computer vision problem hides an inherent ambiguity in the determination of the depth coordinate of the keypoints. In the following work I will start from exploring state-of-the-art approaches used to solve it focusing specifically on Human Hands Pose Estimation I will consider the most natural settings, including self-interaction and interaction with objects. Expressing the groundtruth hand label coordinates in the reference frame centered in a standard point or in a point internal to the hand, plays an important role to the success of the training process. After evaluating the benefits of chosing a specific one, I propose a multi-stage approach separately regressing root-relative pose and root coordinates in the camera reference frame. Such model is then trained and tested on the novel dataset: H2O dataset (2 Hands and Objects)
Reconstructing Human Hands from Egocentric RGB Data
CHIMENTI, ALBERTO
2021/2022
Abstract
Reconstructing 3D poses from a single RGB image is a challenging task. Such computer vision problem hides an inherent ambiguity in the determination of the depth coordinate of the keypoints. In the following work I will start from exploring state-of-the-art approaches used to solve it focusing specifically on Human Hands Pose Estimation I will consider the most natural settings, including self-interaction and interaction with objects. Expressing the groundtruth hand label coordinates in the reference frame centered in a standard point or in a point internal to the hand, plays an important role to the success of the training process. After evaluating the benefits of chosing a specific one, I propose a multi-stage approach separately regressing root-relative pose and root coordinates in the camera reference frame. Such model is then trained and tested on the novel dataset: H2O dataset (2 Hands and Objects)File | Dimensione | Formato | |
---|---|---|---|
Chimenti_Alberto.pdf
accesso riservato
Dimensione
9.84 MB
Formato
Adobe PDF
|
9.84 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/36020