What Motion Reveals about Shape with Unknown Material Behavior
NEC Labs America
Date : 11/12/2014
Image formation is an outcome of a complex interaction between object geometry, lighting and camera, as governed by the reflectance of the underlying material. Psychophysical studies show that motion of the object, light source or camera are important cues for shape perception from image sequences. However, due to the complex and often unknown nature of the bidirectional reflectance distribution function (BRDF) that determines material behavior, computer vision algorithms have traditionally relied on simplifying assumptions such as brightness constancy or Lambertian reflectance. We take a step towards overcoming those limitations by answering a fundamental question: what does motion reveal about unknown shape and material? In each case of light source, object or camera motion, we show that physical properties of BRDFs yield PDE invariants that precisely characterize the extent of shape recovery under a given imaging condition. Conventional optical flow, multiview stereo and photometric stereo follow as special cases. This leads to the surprising result that motion can decipher shape even with complex, unknown material behavior and unknown lighting. Further, we show that contrary to intuition, joint recovery of shape, material and lighting using motion cues is often well-posed and tractable, requiring the solution of only sparse linear systems.
At the beginning of the talk, I will also describe our recent work on 3D scene understanding for autonomous driving. Using a single camera, we demonstrate real-time structure from motion performance on par with stereo, on the challenging KITTI benchmark. Combined with top-performing methods in object detection and tracking, we demonstrate 3D object localization with high accuracy comparable to LIDAR. We demonstrate high-level applications such as scene recognition that form the basis for collaborations on collision avoidance and danger prediction with automobile manufacturers.
Manmohan Chandraker received a B.Tech. in Electrical Engineering at the Indian Institute of Technology, Bombay and a PhD in Computer Science at the University of California, San Diego. Following a postdoctoral scholarship at the University of California, Berkeley, he joined NEC Labs America in Cupertino, where he conducts research in computer vision. His principal research interests are modern optimization methods for geometric 3D reconstruction, 3D scene understanding and recognition for autonomous driving and shape recovery in the presence of complex illumination and material behavior. His work has received the Marr Prize Honorable Mention for Best Paper at ICCV 2007, the 2009 CSE Dissertation Award for Best Thesis at UC San Diego, a nomination for the 2010 ACM Dissertation Award and the Best Paper Award at CVPR 2014, besides appearing in Best Paper Special Issues of IJCV 2009, IEEE PAMI 2011 and 2014.
Scalable Scientific Image Informatics
University of California
Date : 25/09/2014
Recent advances in microscopy imaging, image processing and computing technologies enable large scale scientific experiments that generate not only large collections of images and video, but also pose new computing and information processing challenges. These include providing ubiquitous access to images, videos and metadata resources; creating easily accessible image and video analysis, visualizations and workflows; and publishing both data and analysis resources. Further, contextual metadata, such as experimental conditions in biology, are critical for quantitative analysis. Streamlining collaborative efforts across distributed research teams with online virtual environments will improve scientific productivity, enhance understanding of complex phenomena and allow a growing number of researchers to quantify conditions based on image evidence that so far have remained subjective. This talk will focus on recent work in my group on image segmentation and quantification, followed by a detailed description of the BisQue platform. BisQue (Bio-Image Semantic Query and Environment) is an open-source platform for integrating image collections, metadata, analysis, visualization and database methods for querying and search. We have developed new techniques for managing user-defined datamodels for biological datasets, including experimental protocols, images, and analysis. BisQue is currently used in many laboratories around the world and is integrated into the iPlant cyber-infrastructure (see http://www.iplantcollaborative.org) which serves the plant biology community. For more information on BisQue see http://www.bioimage.ucsb.edu.
B. S. Manjunath received the B.E. degree (with distinction) in electronics from Bangalore University, Bangalore, India, in 1985, the M.E. degree (with distinction) in systems science and automation from the Indian Institute of Science, Bangalore, in 1987, and the Ph.D. degree in electrical engineering from University of Southern California, Los Angeles, in 1991. He is a Professor of electrical and computer engineering, Director of the Center for Bio-Image Informatics, and Director of the newly established Center on Multimodal Big Data Science and Healthcare at the University of California, Santa Barbara. His current research interests include image processing, distributed processing in camera networks, data hiding, multimedia databases, and Bio-image informatics. He has published over 250 peer-reviewed articles on these topics and is a co-editor of the book Introduction to MPEG-7 (Wiley, 2002). He was an associate editor of the IEEE Transactions on Image Processing, Pattern Analysis and Machine Intelligence, Multimedia, Information Forensics, IEEE Signal Processing Letters, and is currently an AE for the BMC Bio Informatics Journal. He is a co-author of the paper that won the 2013 Transactions on Multimedia best paper award and is a fellow of the IEEE.
Rounding-based Moves for Metric Labeling
Ecole Centrale Paris
Date : 27/08/2014
Metric labeling is an important special case of energy minimization in pairwise graphical models. The dominant methods for metric labeling in the computer vision community belong to the move-making family, due to their computational efficiency. The dominant methods in the computer science community belong to the convex relaxations family, due to their strong theoretical guarantees. In this talk, I will present algorithms that combine the best of both worlds: efficient move-making algorithms that provide the same guarantees as the standard linear programming relaxation.
M. Pawan Kumar is an Assistant Professor at Ecole Centrale Paris, and a member of INRIA Saclay. Prior to that, he was a PhD student at Oxford Brookes University (Brookes Vision Group; 2003-2007), a postdoc at Oxford University (Visual Geometry Group; 2008), and a postdoc at Stanford University (Daphne's Approximate Group of Students; 2009-2011).
Signal Processing Meets Optics: Theory, Design and Inference for Computational Imaging
Date : 24/04/2014
For centuries, imaging devices have been based on the principle of 'pinhole projection', which directly captures the desired image. However, it has recently been shown that by co-designing the imaging optics and processing algorithms, we can obtain Computational Imaging (CI) systems that far exceed the performance of traditional cameras. Despite the advances made in designing new computational cameras, there are still many open issues such as: 1) lack of a proper theoretical framework for analysis and design of CI cameras, 2) lack of camera designs for capturing the various dimensions of light with high fidelity and 3) lack of proper use of data-driven methods that have shown tremendous success in other domains.
In this talk, I will address the above mentioned issues. First, I will present a comprehensive framework for analysis of computational imaging systems and provide explicit performance guarantees for many CI systems such as light field and extended-depth-of-field cameras. Second, I will show how camera array can be exploited to capture the various dimensions of light such as spectrum and angle. Capturing these dimensions leads to novel imaging capabilities such as post-capture refocussing, hyper-spectral imaging and natural image retouching. Finally, I will talk about how various machine learning techniques such as robust regression and matrix factorization can be used for solving many imaging problems.
Kaushik Mitra is currently a postdoctoral research associate in the Electrical and Computer Engineering department of Rice University. His research interests are in computational imaging, computer vision and statistical signal processing. He earned his Ph.D. in Electrical and Computer Engineering from the University of Maryland, College Park, where his research focus was on the development of statistical models and optimization algorithms for computer vision problems
Cross-modal Retrieval: Retrieval Across Different Content Modalities
Yahoo Labs, Bangalore
Date : 01/04/2014
In this talk, the problem of cross-modal retrieval from multimedia repositories is considered. This problem addresses the design of retrieval systems that support queries across content modalities, e.g., using an image to search for texts. A mathematical formulation is proposed, equating the design of cross-modal retrieval systems to that of isomorphic feature spaces for different content modalities. Two hypotheses are then investigated, regarding the fundamental attributes of these spaces. The ﬁrst is that low-level cross-modal correlations should be accounted for. The second is that the space should enable semantic abstraction. Three new solutions to the cross-modal retrieval problem are then derived from these hypotheses: correlation matching (CM), an unsupervised method which models cross-modal correlations, semantic matching (SM), a supervised technique that relies on semantic representation, and semantic correlation matching (SCM), which combines both. An extensive evaluation of retrieval performance is conducted to test the validity of the hypotheses. All approaches are shown successful for text retrieval in response to image queries and vice-versa. It is concluded that both hypotheses hold, in a complementary form, although the evidence in favor of the abstraction hypothesis is stronger than that for correlation.
Nikhil Rasiwasia received the B.Tech degree in electrical engineering from Indian Institute of Technology Kanpur (India) in 2005. He received the MS and PhD degrees from the University of California, San Diego in 2007 and 2011 respectively, where he was a graduate student researcher at the Statistical Visual Computing Laboratory, in the ECE department. Currently, he is working as scientist for Yahoo Labs! Bangalore, India. In 2008, he was recognized as an `Emerging Leader in Multimedia' by IBM T. J. Watson Research. He also received the best student paper award at ACM Multimedia conference in 2010. His research interests are in the areas of computer vision and machine learning, in particular applying machine learning solutions to computer vision problems.
Lifting 3D Manhattan Lines from a Single Image
University of California
Date : 10/01/2014
In the first part of the talk, I will present a novel and an efficient method for reconstructing the 3D arrangement of lines extracted from a single image, using vanishing points, orthogonal structure, and an optimization procedure that considers all plausible connectivity constraints between lines. Line detection identifies a large number of salient lines that intersect or nearly intersect in an image, but relatively a few of these apparent junctions correspond to real intersections in the 3D scene. We use linear programming (LP) to identify a minimal set of least-violated connectivity constraints that are sufficient to unambiguously reconstruct the 3D lines. In contrast to prior solutions that primarily focused on well-behaved synthetic line drawings with severely restricting assumptions, we develop an algorithm that can work on real images. The algorithm produces line reconstruction by identifying 95% correct connectivity constraints in York Urban database, with a total computation time of 1 second per image.
In the second part of the talk, I will briefly mention about my other work in graphical models, robotics, geo-localization, generic camera modeling and 3D reconstruction.
Srikumar Ramalingam is a Principal Research Scientist at Mitsubishi Electric Research Lab (MERL). He received his B.E from Anna University (Guindy) in India and his M.S from University of California (Santa Cruz) in USA. He received a Marie Curie Fellowship from European Union to pursue his studies at INRIA Rhone Alpes (France) and he obtained his PhD in 2007. His thesis on generic imaging models received INPG best thesis prize and AFRIF thesis prize (honorable mention) from the French Association for Pattern Recognition. After his PhD, he spent two years in Oxford working as a research associate in Oxford Brookes University, while being an associate member in visual geometry group in Oxford University. He has published more than 30 papers in flagship conferences such as CVPR, ICCV, SIGGRAPH ASIA and ECCV. He has co-edited journals, coauthored books, given tutorials and organized workshops on topics such as multi-view geometry and discrete optimization. His research interests are in computer vision, machine learning and robotics problems.