Discriminative models for multi-class object layout
Many state-of-the-art approaches for object recognition reduce the problem to a
0-1 classification task. Such reductions allow one to leverage sophisticated
classifiers for learning. These models are typically trained independently for
each class using positive and negative examples cropped from images. At
test-time, various post-processing heuristics such as non-maxima suppression
(NMS) are required to reconcile multiple detections within and between
different classes for each image. Though crucial to good performance on
benchmarks, this post-processing is usually defined heuristically.
We introduce a unified model for multi-class object recognition that casts the
problem as a structured prediction task. Rather than predicting a binary label
for each image window independently, our model simultaneously predicts a
structured labeling of the entire image. Our model learns statistics that
capture the spatial arrangements of various object classes in real images,
both in terms of which arrangements to suppress through NMS and which
arrangements to favor through spatial co-occurrence statistics.
We formulate parameter estimation in our model as a max-margin learning
problem. Given training images with ground-truth object locations, we show how
to formulate learning as a convex optimization problem. We employ a cutting
plane algorithm to efficiently learn a model from thousands of training
images. We show state-of-the-art results on the PASCAL VOC benchmark that
indicate the benefits of learning a global model encapsulating the spatial layout of multiple object classes.
Download: pdf
Text Reference
Chaitanya Desai, Deva Ramanan, and Charless Fowlkes. Discriminative models for multi-class object layout. In IEEE International Conference on Computer Vision. 2009.BibTeX Reference
@INPROCEEDINGS{DesaiRF_ICCV_2009,author = "Desai, Chaitanya and Ramanan, Deva and Fowlkes, Charless",
booktitle = "IEEE International Conference on Computer Vision",
title = "Discriminative models for multi-class object layout",
year = "2009",
tag = "object_recognition"
}