Name

Affiliation

Title

Authors

Abstract

10:15-10:30

Arrival and coffee

10:30-10:55

Piotr Dollar

Caltech

Evaluation of State-of-the-Art Pedestrian Detection

P. Dollár, C. Wojek, B. Schiele and P. Perona

Pedestrian detection is a key problem in computer vision, with several applications including robotics, surveillance and automotive safety. We introduce a new, more realistic dataset two orders of magnitude larger than existing datasets. The dataset contains richly annotated video, recorded from a moving vehicle, with challenging images of low resolution, frequently occluded people. We propose improved evaluation metrics, demonstrating that commonly used per-window measures are flawed and can fail to predict system performance on full images. We also benchmark several promising detection systems, providing an overview of state-of-the-art performance and a direct, unbiased comparison of existing methods. Finally, by analyzing common failure cases, we help identify future research directions for the field.

11:00-11:25

Jan Prokaj

USC

3-D Model Based Vehicle Recognition

Jan Prokaj and Gerard Medioni

We present a method for recognizing a vehicle’s make and model in a video clip taken from an arbitrary viewpoint. This is an improvement over existing methods which require a front view. In addition, we present a Bayesian approach for establishing accurate correspondences in multiple view geometry. We take a model-based, top-down approach to classify vehicles. First, the vehicle pose is estimated in every frame by calculating its 3-D motion on a plane using a structure from motion algorithm. Then, exemplars from a database of 3-D models are rotated to the same pose as the vehicle in the video, and projected to the image. Features in the model images and the vehicle image are matched, and a model matching score is computed. The model with the best score is identiﬁed as the model of the vehicle in the video. Results on real video sequences are presented.

11:30-11:55

Hamed Pirsiavash

UCI

Bilinear classifiers for visual recognition

Hamed Pirsiavash, Deva Ramanan, Charless Fowlkes

We describe an algorithm for learning bilinear SVMs. Bilinear classifiers are a discriminative variant of bilinear models, which capture the dependence of data on multiple factors. Such models are particularly appropriate for visual data that is better represented as a matrix or tensor, rather than a vector. Matrix encodings allow for more natural regularization through rank restriction. For example, a rank-one scanning-window classifier yields a separable filter. Low-rank models have fewer parameters and so are easier to regularize and faster to score at run-time. We learn low-rank models with bilinear classifiers. We also use bilinear classifiers for transfer learning by sharing linear factors between different classification tasks. Bilinear classifiers are trained with biconvex programs. Such programs are optimized with coordinate descent, where each coordinate step requires solving a convex program - in our case, we use a standard off-the-shelf SVM solver. We demonstrate bilinear SVMs on difficult problems of people detection in video sequences and action classification of video sequences, achieving state-of-the-art results in both.

12:00-12:25

Larry Matthies

JPL

Real-time pedestrian detection and tracking for mobile robots

Safe operation of mobile robots around people is a paramount concern, which has led DoD sponsors of mobile robot research to shift the focus of robot perception research from terrain-understanding for obstacle detection to classifying which potential obstacles are people. Unlike much research on pedestrian detection, which uses monocular imagery, in this application the availability of 3-D sensors is a given, so it makes most sense to incorporate 3-D perception in the detection and tracking process. I will present an update on our work in this area, which uses real-time stereo vision to create a “polar-perspective” map, segments candidate blobs from this map, applies a classifier to image-based and 3-D features of these blobs, and tracks map blobs over time to suppress false alarms. I will also summarize recently-started extensions to this work to detect and track cars as well and to estimate the head pose of detected pedestrians as an aid to robot path planning.

12:30-1:30

Lunch on 6th floor balcony

1:30-1:55

Oscar Beijbom

UCSD

Single image focus level assessment using SVM

Oscar Beijbom

Diﬀerential white blood cell count is the process of counting and classifying white blood cells in blood smears. It is one of the most common clinical tests which is performed in order to make diagnoses in conjunction with medical examinations. These tests indicate deceases such as infections, allergies, and blood cancer and approximately 200-300 million are done yearly around the world. Cellavision AB has developed machines that automate this work and is the global leader in this market. The method developed in this thesis will replace and improve the auto focus routine in these machines. It makes it possible to capture a focused image in only two steps instead of using an iterative multi step algorithm like those used today in most auto focus systems, including the one currently used at Cellavision. In the proposed method a Support Vector Machine, SVM, is trained to assess quantitatively, from a singel image, the level of defocus as well as the direction of defocus for that image. The SVM is trained on features that measure both the image contrast and the image content. High precision is made possible through extracting features from the diﬀerent parts of the image as well as from the image as a whole. This requires the image to be segmented and a method for doing this is proposed. Using this method 99.5% of the images in the test data’s distances to focus were classiﬁed less or equal to 5µm wrong while over 85% were classiﬁed completely correctly. A 5µm defocus is borderline to what the human eye perceives as defocused.

2:00-2:25

Chaitanya Desai

UCI

Discriminative models for multi-class object layout

Chaitanya Desai, Deva Ramanan, Charless Fowlkes

Many state-of-the-art approaches for object recognition reduce the problem to a 0-1 classiﬁcation task. Such reductions allow one to leverage sophisticated classiﬁers for learning. These models are typically trained independently for each class using positive and negative examples cropped from images. At test-time, various post-processing heuristics such as non-maxima suppression (NMS) are required to reconcile multiple detections within and between different classes for each image. Though crucial to good performance on benchmarks, this post-processing is usually deﬁned heuristically. We introduce a uniﬁed model for multi-class object recognition that casts the problem as a structured prediction task. Rather than predicting a binary label for each image window independently, our model simultaneously predicts a structured labeling of the entire image. Our model learns statistics that capture the spatial arrangements of various object classes in real images, both in terms of which arrangements to suppress through NMS and which arrangements to favor through spatial co-occurrence statistics. We formulate parameter estimation in our model as a max-margin learning problem. Given training images with ground-truth object locations, we show how to formulate learning as a convex optimization problem. We employ a cutting plane algorithm to efﬁciently learn a model from thousands of training images. We show state-of-the-art results on the PASCAL VOC benchmark that indicate the beneﬁts of learning a global model encapsulating the spatial layout of multiple object classes.

2:30-2:55

Baback Moghaddam

JPL

Low-Level Vision for Planetary Change Detection

I will present a prototype automatic vision system for planetary image change detection. Applications include finding new craters and "gullies" on Mars from current orbiting platforms, data-mining multi-mission legacy image databases for undiscovered geologic phenomena, and automating the search for "lost" spacecraft.

3:00-3:25

Hartmut Neven

Google

Which capabilities are missing when trying to design a comprehensive visual search engine?

Computer vision has made significant advances during the last decade. Many capabilities such as the detection of faces or the recognition of rigid textured objects such as landmarks are now working to very satisfying levels. Across the various products and services offered by Google we are interested in analyzing an image crawled on the web in all its aspects. When designing such a comprehensive system it becomes obvious however that important abilities are still lacking. One example is object class recognition that scales to thousands or even millions of classes. Another area where we are still facing obstacles is the reliable recognition of objects that have little surface texture and which are largely contour defined. Even a seemingly simple task such as reading text in a photo is still lacking the accuracy we need. The talk describes our efforts in designing a large scale image recognition system that can analyze any given image on the web with respect to many dimensions. We report on the recognition disciplines in which we made good progress but more importantly call out areas which still require additional work to reach production ready solutions.

3:30-4:00

Coffee Break

4:00-4:25

Luis Goncalves

Evolution Robotics Retail

Computer Vision for Retail Fraud

TBA

A brief description of the work done at Evolution Robotics Retail, using Computer Vision to prevent retail fraud.

4:30-4:55

Ricky Sethi

UCR

Activity Recognition Using a Physics-Based Data Driven Hamiltonian Monte Carlo

Ricky J. Sethi
Amit K. Roy-Chowdhury
Brian E. Moore

Motion and image analysis are both important for activity recognition in video. We present a new approach that extends the Hamiltonian Monte Carlo (HMC) to allow us to simultaneously search over the combined motion and image space in a concerted manner using well-known Markov Chain Monte Carlo (MCMC) techniques. For motion analysis in video, we use tracks generated from the video to calculate the Hamiltonian equations of motion for the systems under study, thus utilizing analytical Hamiltonian dynamics to derive a physically significant HMC algorithm which can be used for activity analysis. We then use image analysis to help explore both the motion energy space and the image space by integrating the Hamiltonian energy-based approach with an image-based data-driven proposal to drive the HMC, thereby yielding a Data Driven HMC (DDHMC). We reduce the enormity of the search space by driving the Hamiltonian dynamics-based MCMC with image data in this DDHMC. We also develop the reverse algorithm, which uses motion energy proposals to search the image space. Experimental validation of the theory is provided on the well-known USF Gait and Weizmann datasets. While HMC has been used in other contexts, this is possibly the first paper that shows how it can be used for activity recognition in video taking into account the image analysis results and using the physical motion information of the system. In addition, the DDHMC framework has potential application to other domains where statistical sampling techniques are useful, as we outline in the section on future work.

5:00-5:25

Anup Doshi

UCSD

Vision-based Driver Attention and Behavior Inference

Anup Doshi and Mohan M. Trivedi

We introduce a new approach to analyzing the attentive and behavioral state of a human subject, given cameras focused on the subject and their environment. In particular, the task of analyzing the focus of attention of a human driver is of primary concern. Up to 80\% of crashes are related to driver inattention; thus it is important for an Intelligent Driver Assistance System (IDAS) to be aware of the driver state. We present a new Bayesian paradigm for estimating human attention specifically addressing the problems arising in live driving situations. We will then discuss several novel findings about driver behavior and how those can affect the design of a vision-based HCI interface for driver assistance.

5:30-5:55

Pietro Perona

Caltech

Towards the visual analysis of animal behavior

TBA

6:00-8:00

Dinner