Designing Physically-Motivated CNNs for Shape and Material
NEC Labs America
Date : 26/12/2017
Image formation is a physical phenomenon that often includes complex factors like shape deformations, occlusions, material properties and participating media. Consequently, practical deployment of intelligent vision-based systems requires robustness to the effects of diverse factors. Such effects may be inverted by modeling the image formation process, but hand-crafted features and hard-coded rules face limitations for data inconsistent with the model. Recent advances in deep learning have led to impressive performances, but generalization of a purely data-driven approach to handle such complex effects is expensive. Thus, an avenue for successfully handling the diversity of real-world images is through incorporation of physical models of image formation within deep learning frameworks.
This talk presents some of our recent works on the design of convolutional neural networks (CNNs) that incorporate physical insights from image formation to learn 3D shape, semantics and material properties, with benefits such as higher accuracy, better generalization or greater ease of training. To handle complex materials, we propose novel 4D CNN architectures for material recognition from a single light field image and model bidirectional reflectance distributions functions (BRDF) to acquire spatially-varying material properties from a single mobile phone image. To handle participating media, we propose a generative adversarial network (GAN) that inverts the effects of complex distortions induced by a turbulent refractive interface. To recover semantics 3D shape despite occlusions, we model the rendering process as deep supervision for intermediate layers in a CNN, which effectively bridges domain gaps to yield superior performance on real images despite training purely on simulations. Finally, we demonstrate state-of-the-art face recognition from profile views through an adversarial learning framework that frontalizes input faces using 3D morphable models.
Manmohan Chandraker is an assistant professor at the CSE department of the University of California, San Diego and heads computer vision research at NEC Labs America. He received a PhD from UCSD and was a postdoctoral scholar at UC Berkeley. His research interests are 3D scene understanding and reconstruction, with applications to autonomous driving and human-computer interfaces. His works have received the Marr Prize Honorable Mention for Best Paper at ICCV 2007, the 2009 CSE Dissertation Award for Best Thesis at UCSD, a PAMI special issue on best papers of CVPR 2011 and the Best Paper Award at CVPR 2014.
Facial Analysis and Synthesis for Photo Editing and Virtual Reality
Georgia Institute of Technology
Date : 25/10/2017
In this talk, speaker will discuss some techniques built around facial analysis and synthesis, focusing broadly on two application areas:
Photo editing: We are motivated by the observation that group photographs are seldom perfect. Subjects may have inadvertently closed their eyes, may be looking away, or may not be smiling at that moment. He will describe how we can combine multiple photos of people into one shot, optimizing desirable facial attributes such as smiles and open-eyes. Our approach leverages advances in facial analysis as well as image stitching to generate high-quality composites.
Headset removal for Virtual Reality:
Headset removal for Virtual Reality: Experiencing VR requires users to wear a headset, but it occludes the face and blocks eye-gaze. He will demonstrate a technique that virtually “removes” the headset and reveals the face underneath it, creating a realistic see-through effect. Using a combination of 3D vision, machine learning and graphics techniques, we synthesize a realistic, personalized 3D model of the user's face that allows us to reproduce their appearance, eye gaze and blinks, which are otherwise hidden by the VR headset.
Vivek Kwatra is a Research Scientist in Machine Perception at Google. He received his B.Tech. in Computer Science & Engineering from IIT Delhi and his Ph.D. in Computer Science from Georgia Tech. He spent two years at UNC Chapel Hill as a Postdoctoral Researcher. His interests lie in computer vision, computational video and photography, facial analysis and synthesis, and virtual reality. He has authored many papers and patents on these topics, and his work has found its way into several Google products.
Perceptual Quality Assessment of Real-World Images and Videos
Dr. Deepti Ghadiyaram
University of Texas at Austin
Date : 22/09/2017
The development of several online social-media venues and rapid advances in technology by camera and mobile device manufacturers have led to the creation and consumption of a seemingly limitless supply of visual content. However, a vast majority of these digital images and videos are often afflicted with annoying artifacts during acquisition, subsequent storage and transmission over the network. Existing automatic quality predictors are designed on unrepresentative databases that only model single, synthetic distortions (such as blur, compression, and so on). However, these models lose their prediction capability when evaluated on real world camera pictures and videos captured using typical real-world mobile camera devices that contain complex mixtures of multiple distortions. Pertaining to over-the-top video streaming, all of the existing quality of experience (QoE) prediction models fail to model the behavioral responses of our visual system to a stimuli containing stalls and playback interruptions.
In her talk, She will focus on two broad topics: 1) construction of distortion-representative image and video quality assessment databases and 2) design of novel quality predictors for real world images and videos. The image and video quality predictors that she present in this talk rely on models based on the natural scene statistics of images and videos, model the complex non-linearities and linearities in the human visual system, and effectively capture a viewer’s behavioral responses to unpleasant stimuli (distortions).
Deepti Ghadiyaram received her PhD in Computer Science from the University of Texas at Austin in August 2017. Her research interests broadly include image and video processing, computer vision, and machine learning. Her Ph.D work was focused on perceptual image and video quality assessment, particularly on building accurate quality prediction models for pictures and videos captured in the wild and understanding a viewer’s time-varying quality of experience while streaming videos. She was a recipient of the UT Austin’s Microelectronics and Computer Development (MCD) Fellowship from 2013 to 2014 and the recipient of Graduate Student Fellowship by the Department of Computer Science for the academic years 2013-2016.
Machine Learning Seminar Series
Dr. Soumith Chintala
Facebook AI Research
Date : 14/02/2017
In this talk, we will first go over the PyTorch deep learning framework, it's architecture, code and new research paradigms that it enables.After that, we shall go over latest developments in Generative Adversarial Networks, and introduce Wasserstein GANs -- a reformulation of GANs that is more stable and fixes optimization problems that traditional GANs face.
Dr. Soumith is a Researcher at Facebook AI Research, where he works on deep learning, reinforcement learning, generative image models, agents for video games and large-scale high-performance deep learning. Prior to joining Facebook in August 2014, he worked at MuseAmi, where he built deep learning models for music and vision targeted at mobile devices.He holds a Masters in CS from CS from NYU, and spent time in Yann LeCun's NYU lab building deep learning models for pedestrian detection, natural image OCR, depth-images among others.