Efficiently Scaling up Crowdsourced Video Annotation - A Set of Best Practices for High Quality, Economical Video Labeling
We present an extensive three year study on
economically annotating video with crowdsourced marketplaces.
Our public framework has annotated thousands of real
world videos, including massive data sets
unprecedented for their size, complexity, and cost. To
accomplish this, we designed a state-of-the-art video
annotation user interface and demonstrate that, despite
common intuition, many contemporary interfaces are
sub-optimal. We present several user studies that evaluate
different aspects of our system and demonstrate
that minimizing the cognitive load of the user is crucial
when designing an annotation platform. We then deploy
this interface on Amazon Mechanical Turk and discover
expert and talented workers who are capable of annotating
difficult videos with dense and closely cropped
labels. We argue that video annotation requires specialized
skill; most workers are poor annotators, mandating robust
quality control protocols. We show that traditional
crowdsourced micro-tasks are not suitable for
video annotation and instead demonstrate that deploying
time-consuming macro-tasks on MTurk is effective.
Finally, we show that by extracting pixel-based features
from manually labeled key frames, we are able to leverage
more sophisticated interpolation strategies to maximize
performance given a fixed budget. We validate the
power of our framework on difficult, real-world data sets
and we demonstrate an inherent trade-off between the
mix of human and cloud computing used vs. the accuracy
and cost of the labeling. We further introduce
a novel, cost-based evaluation criteria that compares
vision algorithms by the budget required to achieve an
acceptable performance. We hope our findings will spur
innovation in the creation of massive labeled video data
sets and enable novel data-driven computer vision applications.
(A preliminary version of this work appeared in ECCV 2010
by Vondrick et al.)
Download: pdf
Text Reference
Carl Vondrick, Donald Patterson, and Deva Ramanan. Efficiently scaling up crowdsourced video annotation - a set of best practices for high quality, economical video labeling. International Journal of Computer Vision, 101(1):184–204, 2013.BibTeX Reference
@article{VondrickPR_IJCV_2013,author = "Vondrick, Carl and Patterson, Donald and Ramanan, Deva",
title = "Efficiently Scaling up Crowdsourced Video Annotation - A Set of Best Practices for High Quality, Economical Video Labeling",
journal = "International Journal of Computer Vision",
volume = "101",
number = "1",
year = "2013",
pages = "184-204"
}