Articulated Pose Estimation With Tiny Synthetic Videos
We address the task of articulated pose estimation from video sequences. We consider an interactive setting where the initial pose is annotated in the first frame. Our system synthesizes a large number of hypothetical scenes with different poses and camera positions by applying geometric deformations to the first frame. We use these synthetic images to generate a custom labeled training set for the video in question. This training data is then used to learn a regressor (for future frames) that predicts joint locations from image data. Notably, our training set is so accurate that nearest-neighbor (NN) matching on low-resolution pixel features works well. As such, we name our underlying representation “tiny synthetic videos”. We present quantitative results the Friends benchmark dataset that suggests our simple approach matches or exceed state-of-the-art.
Text ReferenceDennis Park and Deva Ramanan. Articulated pose estimation with tiny synthetic videos. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 2015.
author = "Park, Dennis and Ramanan, Deva",
title = "Articulated Pose Estimation With Tiny Synthetic Videos",
booktitle = "The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops",
year = "2015"