.

NameAffiliationTitleAuthorsAbstract

.

10:15-10:30Arrival and coffee

.

10:30-10:55Piotr DollarCaltechEvaluation of State-of-the-Art Pedestrian DetectionP. Dollár, C. Wojek, B. Schiele and P. Perona Pedestrian detection is a key problem in computer vision, with several applications including robotics, surveillance and automotive safety. We introduce a new, more realistic dataset two orders of magnitude larger than existing datasets. The dataset contains richly annotated video, recorded from a moving vehicle, with challenging images of low resolution, frequently occluded people. We propose improved evaluation metrics, demonstrating that commonly used per-window measures are flawed and can fail to predict system performance on full images. We also benchmark several promising detection systems, providing an overview of state-of-the-art performance and a direct, unbiased comparison of existing methods. Finally, by analyzing common failure cases, we help identify future research directions for the field.

.

11:00-11:25Jan ProkajUSC3-D Model Based Vehicle Recognition
Jan Prokaj and Gerard MedioniWe present a method for recognizing a vehicle’s make and model in a video clip taken from an arbitrary viewpoint. This is an improvement over existing methods which require a front view. In addition, we present a Bayesian approach for establishing accurate correspondences in multiple view geometry. We take a model-based, top-down approach to classify vehicles. First, the vehicle pose is estimated in every frame by calculating its 3-D motion on a plane using a structure from motion algorithm. Then, exemplars from a database of 3-D models are rotated to the same pose as the vehicle in the video, and projected to the image. Features in the model images and the vehicle image are matched, and a model matching score is computed. The model with the best score is identified as the model of the vehicle in the video. Results on real video sequences are presented.

.

11:30-11:55Hamed PirsiavashUCIBilinear classifiers for visual recognitionHamed Pirsiavash, Deva Ramanan, Charless FowlkesWe describe an algorithm for learning bilinear SVMs. Bilinear classifiers are a discriminative variant of bilinear models, which capture the dependence of data on multiple factors. Such models are particularly appropriate for visual data that is better represented as a matrix or tensor, rather than a vector. Matrix encodings allow for more natural regularization through rank restriction. For example, a rank-one scanning-window classifier yields a separable filter. Low-rank models have fewer parameters and so are easier to regularize and faster to score at run-time. We learn low-rank models with bilinear classifiers. We also use bilinear classifiers for transfer learning by sharing linear factors between different classification tasks. Bilinear classifiers are trained with biconvex programs. Such programs are optimized with coordinate descent, where each coordinate step requires solving a convex program - in our case, we use a standard off-the-shelf SVM solver. We demonstrate bilinear SVMs on difficult problems of people detection in video sequences and action classification of video sequences, achieving state-of-the-art results in both.

.

12:00-12:25Larry MatthiesJPL Real-time pedestrian detection and tracking for mobile robotsSafe operation of mobile robots around people is a paramount concern, which has led DoD sponsors of mobile robot research to shift the focus of robot perception research from terrain-understanding for obstacle detection to classifying which potential obstacles are people. Unlike much research on pedestrian detection, which uses monocular imagery, in this application the availability of 3-D sensors is a given, so it makes most sense to incorporate 3-D perception in the detection and tracking process. I will present an update on our work in this area, which uses real-time stereo vision to create a “polar-perspective” map, segments candidate blobs from this map, applies a classifier to image-based and 3-D features of these blobs, and tracks map blobs over time to suppress false alarms. I will also summarize recently-started extensions to this work to detect and track cars as well and to estimate the head pose of detected pedestrians as an aid to robot path planning.

.

12:30-1:30Lunch on 6th floor balcony

.

1:30-1:55Oscar BeijbomUCSDSingle image focus level assessment using SVMOscar BeijbomDifferential white blood cell count is the process of counting and classifying white blood cells in blood smears. It is one of the most common clinical tests which is performed in order to make diagnoses in conjunction with medical examinations. These tests indicate deceases such as infections, allergies, and blood cancer and approximately 200-300 million are done yearly around the world. Cellavision AB has developed machines that automate this work and is the global leader in this market. The method developed in this thesis will replace and improve the auto focus routine in these machines. It makes it possible to capture a focused image in only two steps instead of using an iterative multi step algorithm like those used today in most auto focus systems, including the one currently used at Cellavision. In the proposed method a Support Vector Machine, SVM, is trained to assess quantitatively, from a singel image, the level of defocus as well as the direction of defocus for that image. The SVM is trained on features that measure both the image contrast and the image content. High precision is made possible through extracting features from the different parts of the image as well as from the image as a whole. This requires the image to be segmented and a method for doing this is proposed. Using this method 99.5% of the images in the test data’s distances to focus were classified less or equal to 5µm wrong while over 85% were classified completely correctly. A 5µm defocus is borderline to what the human eye perceives as defocused.

.

2:00-2:25Chaitanya DesaiUCIDiscriminative models for multi-class object layoutChaitanya Desai, Deva Ramanan, Charless Fowlkes
Many state-of-the-art approaches for object recognition reduce the problem to a 0-1 classification task. Such reductions allow one to leverage sophisticated classifiers for learning. These models are typically trained independently for each class using positive and negative examples cropped from images. At test-time, various post-processing heuristics such as non-maxima suppression (NMS) are required to reconcile multiple detections within and between different classes for each image. Though crucial to good performance on benchmarks, this post-processing is usually defined heuristically. We introduce a unified model for multi-class object recognition that casts the problem as a structured prediction task. Rather than predicting a binary label for each image window independently, our model simultaneously predicts a structured labeling of the entire image. Our model learns statistics that capture the spatial arrangements of various object classes in real images, both in terms of which arrangements to suppress through NMS and which arrangements to favor through spatial co-occurrence statistics. We formulate parameter estimation in our model as a max-margin learning problem. Given training images with ground-truth object locations, we show how to formulate learning as a convex optimization problem. We employ a cutting plane algorithm to efficiently learn a model from thousands of training images. We show state-of-the-art results on the PASCAL VOC benchmark that indicate the benefits of learning a global model encapsulating the spatial layout of multiple object classes.

.

2:30-2:55Baback MoghaddamJPLLow-Level Vision for Planetary Change Detection I will present a prototype automatic vision system for planetary image change detection. Applications include finding new craters and "gullies" on Mars from current orbiting platforms, data-mining multi-mission legacy image databases for undiscovered geologic phenomena, and automating the search for "lost" spacecraft.

.

3:00-3:25Hartmut NevenGoogleWhich capabilities are missing when trying to design a comprehensive visual search engine?Computer vision has made significant advances during the last decade. Many capabilities such as the detection of faces or the recognition of rigid textured objects such as landmarks are now working to very satisfying levels. Across the various products and services offered by Google we are interested in analyzing an image crawled on the web in all its aspects. When designing such a comprehensive system it becomes obvious however that important abilities are still lacking. One example is object class recognition that scales to thousands or even millions of classes. Another area where we are still facing obstacles is the reliable recognition of objects that have little surface texture and which are largely contour defined. Even a seemingly simple task such as reading text in a photo is still lacking the accuracy we need. The talk describes our efforts in designing a large scale image recognition system that can analyze any given image on the web with respect to many dimensions. We report on the recognition disciplines in which we made good progress but more importantly call out areas which still require additional work to reach production ready solutions.

.

3:30-4:00Coffee Break

.

4:00-4:25Luis GoncalvesEvolution Robotics RetailComputer Vision for Retail FraudTBAA brief description of the work done at Evolution Robotics Retail, using Computer Vision to prevent retail fraud.

.

4:30-4:55Ricky SethiUCRActivity Recognition Using a Physics-Based Data Driven Hamiltonian Monte CarloRicky J. Sethi
Amit K. Roy-Chowdhury
Brian E. Moore
Motion and image analysis are both important for activity recognition in video. We present a new approach that extends the Hamiltonian Monte Carlo (HMC) to allow us to simultaneously search over the combined motion and image space in a concerted manner using well-known Markov Chain Monte Carlo (MCMC) techniques. For motion analysis in video, we use tracks generated from the video to calculate the Hamiltonian equations of motion for the systems under study, thus utilizing analytical Hamiltonian dynamics to derive a physically significant HMC algorithm which can be used for activity analysis. We then use image analysis to help explore both the motion energy space and the image space by integrating the Hamiltonian energy-based approach with an image-based data-driven proposal to drive the HMC, thereby yielding a Data Driven HMC (DDHMC). We reduce the enormity of the search space by driving the Hamiltonian dynamics-based MCMC with image data in this DDHMC. We also develop the reverse algorithm, which uses motion energy proposals to search the image space. Experimental validation of the theory is provided on the well-known USF Gait and Weizmann datasets. While HMC has been used in other contexts, this is possibly the first paper that shows how it can be used for activity recognition in video taking into account the image analysis results and using the physical motion information of the system. In addition, the DDHMC framework has potential application to other domains where statistical sampling techniques are useful, as we outline in the section on future work.

.

5:00-5:25Anup DoshiUCSDVision-based Driver Attention and Behavior InferenceAnup Doshi and Mohan M. TrivediWe introduce a new approach to analyzing the attentive and behavioral state of a human subject, given cameras focused on the subject and their environment. In particular, the task of analyzing the focus of attention of a human driver is of primary concern. Up to 80\% of crashes are related to driver inattention; thus it is important for an Intelligent Driver Assistance System (IDAS) to be aware of the driver state. We present a new Bayesian paradigm for estimating human attention specifically addressing the problems arising in live driving situations. We will then discuss several novel findings about driver behavior and how those can affect the design of a vision-based HCI interface for driver assistance.

.

5:30-5:55Pietro PeronaCaltechTowards the visual analysis of animal behaviorTBATBA

.

6:00-8:00Dinner

.