Detecting Actions, Poses, and Objects with Relational Phraselets
We present a novel approach to modeling human pose, together with
interacting objects, based on compositional models of local visual
interactions and their relations. Skeleton models, while flexible
enough to capture large articulations, fail to accurately model
self-occlusions and interactions. Poselets and Visual Phrases address this
limitation, but do so at the expense of requiring a large set of templates.
We combine all three approaches with a compositional model that is flexible
enough to model detailed articulations but still captures occlusions and
object interactions. Unlike much previous work on action classification,
we do not assume test images are labeled with a person, and instead
present results for “action detection” in an unlabeled image. Notably,
for each detection, our model reports back a detailed description including
an action label, articulated human pose, object poses, and occlusion
flags. We demonstrate that modeling occlusion is crucial for recognizing
human-object interactions. We present results on the PASCAL Action
Classification challenge that shows our unified model advances the
state-of-the-art for detection, action classification, and articulated pose
estimation.
Download: pdf
Text Reference
Chaitanya Desai and Deva Ramanan. Detecting actions, poses, and objects with relational phraselets. In ECCV (4), 158–172. 2012.BibTeX Reference
@inproceedings{DesaiR_ECCV_2012,author = "Desai, Chaitanya and Ramanan, Deva",
title = "Detecting Actions, Poses, and Objects with Relational Phraselets",
booktitle = "ECCV (4)",
year = "2012",
pages = "158-172"
}