10:15-10:30	Arrival				Coffee and pastries. Morning meeting will be held in 4011 Donald Bren Hall.
10:30-10:50	Ramesh Jain	UCI	Multimedia Information Retrieval	Ramesh Jain and Team of Students	We will discuss issues and challenges in multimedia information retrieval in the context of computer vision research. We will describe two major research projects in this space in our research group.
10:55-11:15	Antoni Chan	UCSD	Counting People without People Models or Tracking	Antoni B. Chan and Nuno Vasconcelos	We present a privacy-preserving system for estimating the size of inhomogeneous crowds, composed of pedestrians that travel in different directions, without using explicit object segmentation or tracking. First, the crowd is segmented into components of homogeneous motion. Second, a set of simple holistic features is extracted from each segmented region, and the correspondence between features and the number of people per segment is learned with Gaussian Process regression. We validate both the crowd segmentation algorithm, and the crowd counting system, on a large pedestrian dataset (2000 frames of video, containing 49,885 total pedestrian instances). Finally, we present results of the system running on a full hour of video.
11:20-11:40	Baback Moghaddam	Caltech/JPL	Subspectral Algorithms for Sparse Visual Learning	Baback Moghaddam	I will present a set of "subspectral" (sparse eigenvector) algorithms for various visual discrimination tasks. Example include sparse 2D eigenfaces, 3D face-mesh decomposition, parts-based facial gender classification, and pixel selection for OCR using Bayesian (GP) classification. The key "Sparse-LDA" algorithm is shown to be an attractive alternative to Automatic Relevance Determination (ARD), and state-of-the-art recognition is obtained while discarding the majority of pixels (parts). Moreover, these sparse object models show a better fit to data in terms of the "evidence" or marginal likelihood.
11:45-12:05	Antony Lam	UCR	3D Recognition for Faces and Objects	Antony Lam and Christian R. Shelton	We present work on face recognition over 3D rotations of the head using a multiview based approach. In addition, we also present preliminary work on extending this approach to objects in general.
12:10-12:30	Greg Griffin	Caltech	Learning and Using Taxonomies For Fast Visual Category Recognition	Gregory Grifin and Pietro Perona	The computational complexity of current visual categorization algorithms scales linearly at best with the number of categories. The goal of classifying simultaneously Ncat=10^4-10^5 visual categories requires sub-linear classification costs. We explore algorithms for automatically building classification trees which have, in principle, log Ncat complexity. We find that a greedy algorithm that recursively splits the set of categories into the two minimally confused subsets achieves 5-20 fold speedups at a small cost in classification performance. Our approach is independent of the specific classification algorithm used. A welcome by-product of our algorithm is a very reasonable taxonomy of the Caltech-256 dataset.
12:30-1:30	Lunch Break				Lunch Break on the 6th floor balcony. Remainder of meeting will be held in 6011 DBH
1:30-1:55	Larry Matthies	JPL	Visual navigation for planetary landers and balloons	Computer Vision Group, JPL	This talk will start with a quick overview of the range of research activities in the Computer Vision Group at JPL, then address our current main focus of work for NASA, which is visual navigation for landers and balloons in planetary exploration. Under DARPA and Army sponsorship, we are addressing (1) real-time detection and tracking of people around unmanned ground vehicles for safety, (2) self-supervised learning of terrain traversability, and (3) physics-based terrain classification of water and mud hazards for off-road driving. For planetary landers, desired capabilities include real-time, onboard landmark recognition for precision landing, feature tracking for terrain-relative velocity estimation, and landing hazard detection. We are addressing these functions using mono or stereo visible spectrum imagery, as well as lidar, with FPGAs as the primary approach to real-time implementation in space. The main possibility for a balloon mission is to Saturn's moon Titan, which is about the size of our moon but has an atmosphere 1.5x denser than Earth's. Titan's atmosphere admits visible imaging of the surface from a balloon, but from orbit the surface is only resolvable in near infrared or radar wavelengths. Therefore, we are developing methods for balloon position estimation that match visible imagery from a balloon to maps created with orbital near infrared or radar imagery. Unmanned air vehicles (UAVs) on Earth have some visual navigation problems that are closely related to those for planetary landers and balloons, so one of our future goals is to broader our work on vision for planetary landers to address UAV navigation as well.
2:00-2:20	Zhuowen Tu	UCLA	Auto-context and Its Application to High-level Vision and Medical Imaging Applications	Zhuowen Tu	The notion of using {\em context} information for solving the high-level vision problems and medical imaging tasks has been increasingly realized in the field. However, how to learn an effective and efficient context model, together with the image appearance, remains mostly unknown. The current literature using Markov Random Fields (MRFs) and Conditional Random Fields (CRFs) often involves specific algorithm design, in which the modeling and computing stages are studied in isolation. In this paper, we propose an {\bf auto-context} algorithm. Given a set of training images and their corresponding label maps, we first learn a classifier on local image patches. The discriminative probability (or classification confidence) maps created by the learned classifier are then used as context information, in addition to the original image patches, to train a new classifier. The algorithm then iterates to approach the ground truth. Auto-context learns an integrated low-level and context model, and is very general and easy to implement. It selects and fuses a large number of low-level appearance features, with implicit context and shape information, through a sequence of discriminative models. Under nearly an identical parameter setting in training, we apply the algorithm on three challenging vision applications: foreground/background segregation, human body configuration, and scene region labeling. Moreover, context also plays a very important role in medical/brain images where the anatomical structures are mostly positioned and constrained. With only some slight changes from using 2D to 3D features, the auto-context algorithm applied in brain imaging is shown to outperform many the state-of-the-art algorithms specifically designed for brain MRI image segmentation. Further, the scope of the proposed algorithm goes beyond image analysis and it has the potential to be used for a wide variety of problems of multi-variate labeling.
2:25-2:45	Carolina Galleguillos	UCSD	Weakly Supervised Object Recognition and Localization with Stable Segmentations	Carolina Galleguillos, Boris Babenko, Andrew Rabinovich and Serge Belongie	Multiple Instance Learning (MIL) provides a framework for training a discriminative classifier from data with ambiguous labels. This framework is well suited for the task of learning object classifiers from weakly labeled image data, where only the presence of an object in an image is known, but not its location. Some recent work has explored the application of MIL algorithms to the tasks of image categorization and natural scene classification. In this paper we extend these ideas in a framework that uses MIL to recognize and localize objects in images. To achieve this we employ state of the art image descriptors and multiple stable segmentations. These components, combined with a powerful MIL algorithm, form our object recognition system called MILSS. We show highly competitive object categorization results on the Caltech dataset. To evaluate the performance of our algorithm further, we introduce the challenging Landmarks-18 dataset, a collection of photographs of famous landmarks from around the world. The results on this new dataset show the great potential of our proposed algorithm.
2:50-3:10	Evgeniy Bart	Caltech	Unsupervised learning of visual taxonomies	Bart, Porteous, Welling, Perona	I will describe a method for organizing multiple visual categories into a taxonomy. Such organization becomes crucial as the number of available categories increases. The proposed method extends current non-parametric Bayesian techniques such as Nested Chinese Restaurant Process (NCRP). The method is completely unsupervised. It discovers commonalities among images and exploits these commonalities to represent images compactly in a hierarchical manner. Visual categories emerge and become organized in a taxonomy automatically during this process.
3:15-3:35	Nikhil Rasiwasia	UCSD	Image Retrieval on Contextual Spaces.	Nikhil Rasiwasia Nuno Vasconcelos	Current image retrieval techniques have difficulties to retrieve images which exhibit distinct visual patterns but belong to the class of the query image. Previous attempts to improve 'generalization' have shown that the introduction of semantic representations can mitigate this problem. We extend the existing query-by-semantic-example (QBSE) retrieval paradigm by adding a second layer of semantic representation. At the first level, the representation is driven by patch-based visual features. Semantic concepts, from a pre-defined vocabulary, are modeled as Gaussian mixtures on a visual feature space, and images as vectors of posterior probabilities of containing each of the semantic concepts. At the second level, the representation is purely semantic. Semantic concepts are modeled as Dirichlet mixtures on the semantic feature space of QBSE, and images are again represented as vectors of posterior concept probabilities. The proposed retrieval strategy, is able to cope with the ambiguities of patch-based classification, exhibiting significantly better generalization than previous methods.
3:40-4:00	Coffee Break
4:00-4:25	Hartmut Neven	Google	Object and Face Recognition at Google	Hartwig Adam, Ulrich Buddemeier, Johannes Steffens, German Cheung, Jiayong Zhang, Jinjun Xu, Alessandro Bissacco, Fernando Brucher, Carolina Galleguillos, John Flynn, Hartmut Neven	I will give an overview over research we are conducting at Google to improve on the state-of-the art in object and face recognition. The technology used to detect and blur faces and license plates in Street View will be discussed as well as the face recognition technologies employed in the recent Picasa update that supports users in name tagging their photos. I will report on our efforts to automatically recognize landmarks in photos which reached an unprecedented level of accuracy. Finally I will describe work to improve image matching and machine learning using adiabatic quantum computing. Specifically I will discuss an end-to-end implementation developed in collaboration with D-Wave that performs image matching using a 28 qubit chip and newer theoretical work that maps the training of a binary classifier to a format amenable to the quantum adiabatic algorithm.
4:30-4:50	Vincent Rabaud	UCSD	Manifold Learning Techniques for Non-Rigid Structure	Vincent Rabaud Serge Belongie	NRSFM can be conceived as a manifold learning problem. Two techniques will be presented depending on whether the manifold is supposed to be linear or not.
4:55-5:15	Piotr Dollar	Caltech	Multiple Component Learning for Object Detection	Piotr Dollar, Boris Babenko, Serge Belongie, Pietro Perona and Zhuowen Tu	Object detection is one of the key problems in computer vision. In the last decade, discriminative learning approaches have proven effective in detecting rigid objects, achieving very low false positives rates. The field has also seen a resurgence of part-based recognition methods, with impressive results on highly articulated, diverse object categories. In this paper we propose a discriminative learning approach for detection that is inspired by part-based recognition approaches. Our method, Multiple Component Learning (MCL), automatically learns individual component classifiers and combines these into an overall classifier. Unlike previous methods, which rely on either fairly restricted part models or labeled part data, MCL learns powerful component classifiers in a weakly supervised manner, where object labels are provided but part labels are not. The basis of MCL lies in learning a set classifier; we achieve this by combining boosting with weakly supervised learning, specifically the Multiple Instance Learning framework (MIL). MCL is general, and we demonstrate results on a range of data from computer audition and computer vision. In particular, MCL outperforms all existing methods on the challenging INRIA pedestrian detection dataset, and unlike methods that are not part-based, MCL is quite robust to occlusions.
5:20-5:40	Pradeep Natarajan	USC	View and Scale Invariant Action Recognition Using Multiview Shape-Flow Models	Pradeep Natarajan, Ramakant Nevatia	Actions in real world applications typically take place in cluttered environments with large variations in the orientation and scale of the actor. We present an approach to simultaneously track and recognize known actions that is robust to such variations, starting from a person detection in the standing pose. In our approach we first render synthetic poses from multiple viewpoints using Mocap data for known actions and represent them in a Conditional Random Field(CRF) whose observation potentials are computed using shape similarity and the transition potentials are computed using optical flow. We enhance these basic potentials with terms to represent spatial and temporal constraints and call our enhanced model the Shape,Flow,Duration- Conditional Random Field(SFD-CRF). We find the best sequence of actions using Viterbi search in the SFD-CRF. We demonstrate our approach on videos from multiple viewpoints and in the presence of background clutter.
5:45-6:05	Merrielle Spain	Caltech	Some Objects Are More Equal Than Others: Measuring and Predicting Importance	Merrielle Spain and Pietro Perona	We observe that everyday images contain dozens of objects, and that humans, in describing these images, give different priority to these objects. We argue that a goal of visual recognition is, therefore, not only to detect and classify objects but also to associate with each a level of priority which we call ‘importance’. We propose a definition of importance and show how this may be estimated reliably from data harvested from human observers. We conclude by showing that a first-order estimate of importance may be computed from a number of simple image region measurements and does not require access to image meaning.
6:15-8:00	Dinner				Dinner at the University Club. Within walking distance, just west of Bren Hall. We are eating in Room C.