CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login
From 2017
Year wise list: 2024 | 2023 | 2022 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013 |

Designing Physically-Motivated CNNs for Shape and Material

manmohan.jpg

Dr.Manmohan Chandraker

NEC Labs America

Date : 26/12/2017

 

Abstract:

Image formation is a physical phenomenon that often includes complex factors like shape deformations, occlusions, material properties and participating media. Consequently, practical deployment of intelligent vision-based systems requires robustness to the effects of diverse factors. Such effects may be inverted by modeling the image formation process, but hand-crafted features and hard-coded rules face limitations for data inconsistent with the model. Recent advances in deep learning have led to impressive performances, but generalization of a purely data-driven approach to handle such complex effects is expensive. Thus, an avenue for successfully handling the diversity of real-world images is through incorporation of physical models of image formation within deep learning frameworks.

This talk presents some of our recent works on the design of convolutional neural networks (CNNs) that incorporate physical insights from image formation to learn 3D shape, semantics and material properties, with benefits such as higher accuracy, better generalization or greater ease of training. To handle complex materials, we propose novel 4D CNN architectures for material recognition from a single light field image and model bidirectional reflectance distributions functions (BRDF) to acquire spatially-varying material properties from a single mobile phone image. To handle participating media, we propose a generative adversarial network (GAN) that inverts the effects of complex distortions induced by a turbulent refractive interface. To recover semantics 3D shape despite occlusions, we model the rendering process as deep supervision for intermediate layers in a CNN, which effectively bridges domain gaps to yield superior performance on real images despite training purely on simulations. Finally, we demonstrate state-of-the-art face recognition from profile views through an adversarial learning framework that frontalizes input faces using 3D morphable models.​

 

Bio:

Manmohan Chandraker is an assistant professor at the CSE department of the University of California, San Diego and heads computer vision research at NEC Labs America. He received a PhD from UCSD and was a postdoctoral scholar at UC Berkeley. His research interests are 3D scene understanding and reconstruction, with applications to autonomous driving and human-computer interfaces. His works have received the Marr Prize Honorable Mention for Best Paper at ICCV 2007, the 2009 CSE Dissertation Award for Best Thesis at UCSD, a PAMI special issue on best papers of CVPR 2011 and the Best Paper Award at CVPR 2014.


 

Facial Analysis and Synthesis for Photo Editing and Virtual Reality

Vivek Kwatra

Dr.Vivek Kwatra

Georgia Institute of Technology

Date : 25/10/2017

 

Abstract:

​In this talk, speaker will discuss some techniques built around facial analysis and synthesis, focusing broadly on two application areas:

Photo editing:

Photo editing: We are motivated by the observation that group photographs are seldom perfect. Subjects may have inadvertently closed their eyes, may be looking away, or may not be smiling at that moment. He will describe how we can combine multiple photos of people into one shot, optimizing desirable facial attributes such as smiles and open-eyes. Our approach leverages advances in facial analysis as well as image stitching to generate high-quality composites.

Headset removal for Virtual Reality:

Headset removal for Virtual Reality: Experiencing VR requires users to wear a headset, but it occludes the face and blocks eye-gaze. He will demonstrate a technique that virtually “removes” the headset and reveals the face underneath it, creating a realistic see-through effect. Using a combination of 3D vision, machine learning and graphics techniques, we synthesize a realistic, personalized 3D model of the user's face that allows us to reproduce their appearance, eye gaze and blinks, which are otherwise hidden by the VR headset.

 

Bio:

Vivek Kwatra is a Research Scientist in Machine Perception at Google. He received his B.Tech. in Computer Science & Engineering from IIT Delhi and his Ph.D. in Computer Science from Georgia Tech. He spent two years at UNC Chapel Hill as a Postdoctoral Researcher. His interests lie in computer vision, computational video and photography, facial analysis and synthesis, and virtual reality. He has authored many papers and patents on these topics, and his work has found its way into several Google products.


 

Perceptual Quality Assessment of Real-World Images and Videos

Deepti Ghadiyaram

Dr. Deepti Ghadiyaram

University of Texas at Austin

Date : 22/09/2017

 

Abstract:

​The development of several online social-media venues and rapid advances in technology by camera and mobile device manufacturers have led to the creation and consumption of a seemingly limitless supply of visual content. However, a vast majority of these digital images and videos are often afflicted with annoying artifacts during acquisition, subsequent storage and transmission over the network. Existing automatic quality predictors are designed on unrepresentative databases that only model single, synthetic distortions (such as blur, compression, and so on). However, these models lose their prediction capability when evaluated on real world camera pictures and videos captured using typical real-world mobile camera devices that contain complex mixtures of multiple distortions. Pertaining to over-the-top video streaming, all of the existing quality of experience (QoE) prediction models fail to model the behavioral responses of our visual system to a stimuli containing stalls and playback interruptions.

In her talk, She will focus on two broad topics: 1) construction of distortion-representative image and video quality assessment databases and 2) design of novel quality predictors for real world images and videos. The image and video quality predictors that she present in this talk rely on models based on the natural scene statistics of images and videos, model the complex non-linearities and linearities in the human visual system, and effectively capture a viewer’s behavioral responses to unpleasant stimuli (distortions).

 

Bio:

Deepti Ghadiyaram received her PhD in Computer Science from the University of Texas at Austin in August 2017. Her research interests broadly include image and video processing, computer vision, and machine learning. Her Ph.D work was focused on perceptual image and video quality assessment, particularly on building accurate quality prediction models for pictures and videos captured in the wild and understanding a viewer’s time-varying quality of experience while streaming videos. She was a recipient of the UT Austin’s Microelectronics and Computer Development (MCD) Fellowship from 2013 to 2014 and the recipient of Graduate Student Fellowship by the Department of Computer Science for the academic years 2013-2016.


 

Machine Learning Seminar Series

SoumithChintala

Dr. Soumith Chintala

Facebook AI Research

Date : 14/02/2017

 

Abstract:

In this talk, we will first go over the PyTorch deep learning framework, it's architecture, code and new research paradigms that it enables.After that, we shall go over latest developments in Generative Adversarial Networks, and introduce Wasserstein GANs -- a reformulation of GANs that is more stable and fixes optimization problems that traditional GANs face.

 

Bio:

Dr. Soumith is a Researcher at Facebook AI Research, where he works on deep learning, reinforcement learning, generative image models, agents for video games and large-scale high-performance deep learning. Prior to joining Facebook in August 2014, he worked at MuseAmi, where he built deep learning models for music and vision targeted at mobile devices.He holds a Masters in CS from CS from NYU, and spent time in Yann LeCun's NYU lab building deep learning models for pedestrian detection, natural image OCR, depth-images among others.


 

From 2016
Year wise list: 2024 | 2023 | 2022 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013 |

Towards Visual 3D Scene Understanding and Prediction for ADAS

 

Manmohan Chandrakar

Manmohan Chandrakar

NEC Labs America

Date : 23/12/2016

 

Abstract:

Modern advanced driver assistance systems (ADAS) rely on a range of sensors including radar, ultrasound, cameras and LIDAR. Active sensors such as radar are primarily used for detecting traffic participants (TPs) and measuring their distance. More expensive LIDAR are used for estimating both traffic participants and scene elements (SEs). However, camera-based systems have the potential to achieve the same capabilities at a much lower cost, while allowing new ones such as determination of TP and SE types as well as their interactions in complex traffic scenes.

In this talk, we present several technological developments for ADAS. A common theme is to overcome challenges posed by lack of large-scale annotations in deep learning frameworks. We introduce approaches to correspondence estimation that are trained on purely synthetic data but adapt well to real data at test-time. Posing the problem in a metric learning framework with fully convolutional architectures allows estimation accuracies that surpass other state-of-art by large margins. We introduce object detectors that are light enough for ADAS, trained with knowledge distillation to retain accuracies of deeper architectures. Our semantic segmentation methods are trained on weak supervision that requires only a tenth of conventional annotation time. We propose methods for 3D reconstruction that use deep supervision to recover fine object part locations, but rely on purely synthetic  3D CAD models. Further, we develop generative adversarial frameworks for reconstruction that alleviate the need to even align the 3D CAD models with images at train time. Finally, we present a framework for TP behavior prediction in complex traffic scenes, that utilizes the above as inputs to predict future trajectories that fully account for TP-TP and TP-SE interactions. Our approach allows prediction of diverse uncertain outcomes and is trained with inverse optimal control to predict long-term strategic behaviors in complex scenes.

Bio:

Manmohan Chandraker is an assistant professor at the CSE department of University of California, San Diego and leads the computer vision research effort at NEC Labs America in Cupertino. He received a B.Tech. in Electrical Engineering at the Indian Institute of Technology, Bombay and a PhD in Computer Science at the University of California, San Diego. His principal research interests are sparse and dense 3D reconstruction, including structure-from-motion, 3D scene understanding and dense modeling under complex illumination or material behavior, with applications to autonomous driving, robotics or human-computer interfaces. His work on provably optimal algorithms for structure and motion estimation received the Marr Prize Honorable Mention for Best Paper at ICCV 2007, the 2009 CSE Dissertation Award for Best Thesis at UC San Diego and was a nominee for the 2010 ACM Dissertation Award. His work on shape recovery from motion cues for complex material and illumination received the Best Paper Award at CVPR 2014.


Learning without exhaustive supervision

 

IshanMisra

Ishan Misra

Carnegie Mellon University

Date : 15/12/2016

 

Abstract:

Recent progress in visual recognition can be attributed to large datasets and high capacity learning models. Sadly, these data hungry models tend to be supervision hungry as well. In my research, I focus on algorithms that learn from large amounts of data without exhaustive supervision. The key approach to make algorithms "supervision efficient" is to exploit the structure or prior properties available in the data or labels, and model it in the learning algorithm. In this talk, I will focus on three axes to get around having exhaustive annotations: 1) finding structure and natural supervision in data to reduce the need for manual labels: unsupervised and semi-supervised learning from video [ECCV'16, CVPR'15]; 2) sharing information across tasks so that tasks which are easier or "free" to label can help other tasks [CVPR'16]; and 3) finding structure in the labels and the labeling process so that one can utilize labels in the wild [CVPR'16].

Bio:

Ishan Misra is a PhD student at Carnegie Mellon University, working with Martial Hebert and Abhinav Gupta. His research interests are in Computer Vision and Machine Learning, particularly in visual recognition. Ishan got his BTech in Computer Science from IIIT-Hyderabad where he worked with PJ Narayanan. He got the Siebel fellowship in 2014, and has spent two summers as an intern at Microsoft Research, Redmond.


Understanding Stories by Joint Analysis of Language and Vision

 

MakarandTapaswi

Makarand Tapaswi

KIT Germany

Date : 14/09/2016

 

Abstract:

Humans spend a large amount of time listening, watching, and reading stories. We argue that the ability to model, analyze, and create new stories is a stepping stone towards strong AI. We thus work on teaching AI to understand stories in films and TV series. To obtain a holistic view of the story, we align videos with novel sources of text such as plot synopses and books. Plots contain a summary of the core story and allow to obtain a high-level overview. On the contrary, books provide rich details about characters, scenes and interactions allowing to ground visual information in corresponding textual descriptions. We also work on testing machine understanding of stories by asking it to answer questions. To this end, we create a large benchmark dataset of almost 15,000 questions from 400 movies and explore its characteristics with several baselines.

Bio:

MakarandTapaswireceived his undergraduate education from NITK Surathkal in Electronics and Communications Engineering. Thereafter he pursued an Erasmus Mundus Masters program in Information and Communication Technologies from UPC Barcelona and KIT Germany. He continued with the Computer Vision lab at Karlsruhe Institute of Technology in Germany and recently completed his PhD. He will be going to University of Toronto as a post-doctoral fellow starting in October.


 

Computer Vision @ Facebook

ManoharPaluri

Manohar Paluri

Facebook AI Research

Date : 26/07/2016

Abstract:

Over the past 5 years the community has made significant strides in the field of Computer Vision. Thanks to large scale datasets, specialized computing in form of GPUs and many breakthroughs in modeling better convnet architectures Computer Vision systems in the wild at scale are becoming a reality. At Facebook AI Research we want to embark on the journey of making breakthroughs in the field of AI and using them for the benefit of connecting people and helping remove barriers for communication. In that regard Computer Vision plays a significant role as the media content coming to Facebook is ever increasing and building models that understand this content is crucial in achieving our mission of connecting everyone. In this talk I will gloss over how we think about problems related to Computer Vision at Facebook and touch various aspects related to supervised, semi-supervised, unsupervised learning. I will jump between various research efforts involving representation learning and predictive learning as well. I will highlight some large scale applications and talk about limitations of current systems through out the talk to motivate the reason to tackle these problems.

Bio:

Manohar Paluri is currently a Research Lead at Facebook AI research and manages the Computer Vision team in the Applied Research organization. He is passionate about Computer Vision and in the longer term goal of building systems that can perceive the way humans do. In the process of working on various perception efforts through out his career he spent considerable time looking at Computer Vision problems in Industry and Academia. He worked at renowned places like Google Research, IBM Watson Research Labs, Stanford Research Institute before helping co found Facebook AI Research directed by Dr. Yann Lecun. Manohar spent his formative years at IIIT Hyderabad where he finished his undergraduate studies with Honors in Computer Vision and joined Georgia Tech. to pursue his Ph.D. For over a decade he has been working on various problems related to Computer Vision and in general Perception and has made various contributions through his publications at CVPR, NIPS, ICCV, ECCV, ICLR, KDD, IROS, ACCV etc. He is passionate about building real world systems that are used by billions of people and bring positive value at scale. Some of these systems are running at Facebook and already have tremendous impact in how people communicate using Facebook.

 

Object detection methods for Common Objects in Context

SoumithChintala

Soumith Chintala

Facebook AI Research

Date : 19/01/2016

 

Abstract:

In this talk, we discuss this year's COCO object detection challenge, Facebook's entry into the challenge that placed 2nd in the competition using deep convnets and specially designed cost functions. We shall also discuss the DeepMask segmentation system that forms the core of our detection pipeline.  http://mscoco.org/dataset/#detections-leaderboard

Bio:

Soumith Chintala is a well known face in the deep learning community. He works with Facebook AI Research (FAIR). Prior, to that he has worked with MuseAmi and Prof. Yann LeCun's group in NYU. He is well known as lead maintainer of Torch7 machine learning library an other open source projects like convnet-benchmarks, eblearn.

 

Bayesian Nonparameric Modeling of Temporal Coherence for Entity-driven Video Analytics

Aravindh

Adway Mitra

PhD, IISc Bangalore

Date : 19/01/2016

 

Abstract:

Due to the advent of video-­sharing sites like Youtube, online user­generated video content is increasing very rapidly. To simplify search of meaningful information from such huge volume of content, Computer Vision researchers have started to work on problems like Video Summarization and Scene Discovery from videos. People understand videos based on high­-level semantic concepts. But most of the current research in video analytics makes use of low­level features and descriptors, which may not have semantic interpretation. We have aimed to fill in this gap, by modeling implicit structural information about videos, such as spatio­temporal properties. We have represented videos as a collection of semantic visual concepts which we call “entities”, such as persons in a movie.To aid these tasks, we have attempted to model the important property of “temporal coherence”, which means that adjacent frames are likely to have similar visual features, and contain the same set of entities. Bayesian nonparametrics is a natural way of modeling all these, but they have also given rise to the need for new models and algorithms.

A tracklet is a spatio­temporal fragment of a video­ a set of spatial regions in a short sequence of consecutive frames, each of which enclose a particular entity. We first attempt to find a representation of tracklets to aid tracking entities across videos using region descriptors like Covariance Matrices of spatial features making use of temporal coherence. Next, we move to modeling temporal coherence at a semantic level. Each tracklet is associated to an entity. Spatio­-temporally close but non­overlapping tracklets are likely to belong to the same entity, while tracklets that overlap in time can never belong to the same entity. The aim is to cluster the tracklets based on the entities associated with them, with the goal of discovering the entities in the video along with all their occurrences. We represented an entity with a mixture component, and proposed a temporally coherent version of Chinese Restaurant Process (TC­CRP) that can encode the constraints easily. TC­CRP shows excellent performance on person discovery from TV ­series videos. We also discuss semantic video summarization, based on entity discovery. Next, we considered entity­driven temporal segmentation of the video into scenes, where each scene is modeled as a sparse distribution over entities.. We proposed EntScene: a generative model for videos based on entities and scenes, and also an inference algorithm based on Dynamically Blocked Gibbs Sampling. We experimentally found significant improvements in terms of segmentation and scene discovery compared to alternative inference algorithms.

 

Recent Advances in the Field of Missing Data: Missingness Graphs, Recoverability and Testability

Aravindh

Karthika Mohan

UCLA

Date : 06/01/2016

 

Abstract:

The talk will discuss recent advances in the field of missing data, including: 1)  the graphical representation called "Missingness Graph" that portrays the causal mechanisms responsible for missingness, 2) the notion of recoverability, i.e. deciding whether there exists a consistent estimator for a given query, 3) graphical conditions (necessary and sufficient) for recovering joint and conditional distributions and algorithms for detecting these conditions in the missingness graph, 4) the question of testability i.e. whether an assumed model can be subjected to statistical tests, considering the
missingness in the data and 5) the indispensability of causal assumptions for large sets of missing data problems.

Brief Bio:

Karthika Mohan is a PhD Candidate in the department of Computer Science at UCLA majoring in Artificial Intelligence. She is advised by Prof Judea Pearl. Her research interests span areas in Artificial Intelligence and Statistics that intersect with Probabilistic Graphical Models, Causal Inference and Missing Data

 

 
From 2015
Year wise list: 2024 | 2023 | 2022 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013 |

Understanding Deep Image Representations by Inverting Them

Aravindh

Aravindh Mahendran

D.Phil at University of Oxford

Date : 19/12/2015

 

Abstract:

Image representations, from SIFT and Bag of Visual Words to Convolutional Neural Networks (CNNs), are a crucial component of almost any image understanding system. Nevertheless, our understanding of them remains limited. In this talk I'll discuss our experiments on the visual information contained in representations by asking the following question: given an encoding of an image, to which extent is it possible to reconstruct the image itself? To answer this question we contribute a general framework to invert representations. We show that this method can invert representations such as HOG and SIFT more accurately than recent alternatives while being applicable to CNNs too. We then use this technique to study the inverse of recent state-of-the-art CNN image representations for the first time. Among our findings, we show that several layers in CNNs retain photographically accurate information about the image, with different degrees of geometric and photometric invariance.

Brief Bio:

Aravindh did his undergraduate in CSE at IIIT Hyderabad from 2008 to 2012. He worked in the Cognitive Science lab and Robotics lab as part of his undergraduate research (B.Tech honors). He completed an MSc in Robotics at Carnegie Mellon University and is currently reading for a D.Phil in Engineering Science at the University of Oxford, Visual Geometry Group with Prof. Andrea Vedaldi.

 

Understanding Reality for Generate Credible Augmentations

PushmeetKohli

Pushmeet Kohli

Microsoft Research

Date : 01/12/2015

 

Brief Bio:

Pushmeet Kohli is a principal research scientist in Microsoft Research. In 2015, he was appointed the technical advisor to Rick Rashid, the Chief Research Officer of Microsoft. Pushmeet's research revolves around Intelligent Systems and Computational Sciences, and he publishes in the fields of Machine Learning, Computer Vision, Information Retrieval, and Game Theory. His current research interests include 3D Reconstruction and Rendering, Probabilistic Programming, Interpretable and Verifiable Knowledge Representations from Deep Models. He is also interested in Conversation agents for Task completion, Machine learning systems for Healthcare and 3D rendering and interaction for augmented and virtual reality. His papers have won awards in ICVGIP 2006, 2010, ECCV 2010, ISMAR 2011, TVX 2014, CHI 2014, WWW 2014 and CVPR 2015. His research has also been the subject of a number of articles in popular media outlets such as Forbes, Wired, BBC, New Scientist and MIT Technology Review. Pushmeet is a part of the Association for Computing Machinery's (ACM) Distinguished Speaker Program.


 

Learning to Super-Resolve Images Using Self-Similarities

AbhishekSingh

Dr. Abhishek Singh

Research Scientist, Amazon Lab126

Date : 17/11/2015

 

Abstract:

The single image super-resolution problem involves estimating a high-resolution image from a single, low-resolution observation. Due to its highly ill-posed nature, the choice of appropriate priors has been an active research area of late. Data driven or learning based priors have been successful in addressing this problem. In this talk, I will review some recent learning based approaches to the super-resolution problem, and present some novel algorithms which can better super-resolve high-frequency details in the scene. In articular, I will talk about novel self-similarity driven algorithms that do not require any external database of training images, but instead, learn the mapping from low-resolution to high-resolution using patch recurrence across scales,within the same image. Furthermore, I will also present a novel framework for jointly/simultaneously addressing the super-resolution and denoising problems, in order to obtain a clean, high-resolution image from a single, noise corrupted, low-resolution observation.

Brief Bio:

Abhishek Singh is a Research Scientist at Amazon Lab126 in Sunnyvale, California. He obtained a Ph.D. in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign in Feb 2015, where he worked with Prof. Narendra Ahuja on learning based super-resolution algorithms, among other problems. He was the recipient of the Joan and Lalit Bahl Fellowship, and the Computational Science and Engineering Fellowship at the University of Illinois. He has also been affiliated with Mitsubishi Electric Research Labs, Siemens Corporate Research, and UtopiaCompression Corporation. His current research interests include learning based approaches for low level vision and image processing problems. For more information, please visit http://www.abhishek486.com

 

Multi-view Learning using Statistical Dependence

AbhishekTripathi

Dr. Abhishek Tripathi

Research Scientist, Xerox Research Centre India

Date : 02/11/2015

 

Abstract:

Multi-view learning is a task of learning from multiple sources with co-occurred samples. Here, I will talk about multi-view learning techniques which find shared information between multiple sources in an unsupervised setting. We use statistical dependence as a measure to find shared information. Multi-view learning becomes more challenging and interesting (i) without co-occurred samples in multiple views and (ii) with arbitrary collection of matrices. I will present our work around these two problems with the help of some practical applications.

Brief Bio:

Dr. Abhishek Tripathi is working as a Research Scientist in Xerox Research Centre India (XRCI), Bangalore since January 2012. He is part of the Machine Learning group, where the focus domains include Transportation, Healthcare and Human Resource. Prior to XRCI, Abhishek had spent one year at Xerox Research Centre Europe, France. He received his PhD in Computer Science from University of Helsinki, Finland. His research interests include unsupervised multi-view learning, matrix factorization, recommender systems, data fusion and dimensionality reduction.

 

Parallel Inverse Kinematics for Multi-Threaded Architectures

PawanHarish

Dr. Pawan Harish

Post Doctoral Researcher at EPFL

Date : 02/11/2015

 

Abstract:

In this talk I will present a parallel prioritized Jacobian based inverse kinematics algorithm for multi-threaded architectures. The approach solves damped least squares inverse kinematics using a parallel line search by identifying and sampling critical input parameters. Parallel competing execution paths are spawned for each parameter in order to select the optimum which minimizes the error criteria. The algorithm is highly scalable and can handle complex articulated bodies at interactive frame rates. The results are shown on complex skeletons consisting of more than 600 degrees of freedom while being controlled using multiple end effectors. We implement our algorithm both on multi-core and GPU architectures and demonstrate how the GPU can further exploit fine-grain parallelism not directly available on a multicore processor. The implementations are 10 - 150 times faster compared to a state-of-art serial implementations while providing higher accuracy. We also demonstrate the scalability of the algorithm over multiple scenarios and explore the GPU implementation in detail.

Brief Bio:

Pawan Harish joined the PhD program at IIIT, Hyderabad, India in 2005 where he focused on Computational Displays and on parallelizing graph algorithms on the GPU under the supervision of Prof. P. J. Narayanan. He completed his PhD in 2013 and joined University of California, Irvine as a visiting scholar. He worked at Samsung Research India as a technical lead before joining IIG EPFL as a post doctoral researcher in June 2014. His current research, in association with Moka Studios, is on designing parallel inverse kinematics algorithm on the GPU. His interests include parallel algorithms, novel displays, CHI and computer graphics.

 

Artificial Intelligence Research @ Facebook

ManoharPaluri

Dr. Manohar Paluri

Ph.D. at Georgia Institute of Technology

Date : 30/10/2015

 

Abstract:

Facebook has to deal with billions of data points (likes, comments, shares, posts, photos, videos and many more). The only way to provide access to this plethora of information in a structured way is to understand the data and learn mathematical models for various problems like Search, NewsFeed Ranking, Instagram trending, Ads targeting etc.). Facebook Artificial Intelligence Research(FAIR) group aims to do this with some of the brightest minds in the field directed by Dr. Yann Lecun. Our goal is to be at the forefront of Artificial Intelligence and bring that technology to the billions of users using Facebook and beyond. In this talk I will touch a few asepcts of our focus areas and focus more on Computer Vision related projects. This will be a very high-level overview talk with a few slides of technical details and the goal is to motivate everyone about the challenges we face at FAIR.

Brief Bio:

Manohar Paluri is currently managing the Applied Computer Vision group at Facebook. His group focuses on building the world's largest image and video understanding platform. Manohar got his Bachelor's degree from IIIT Hyderabad with Honors in Computer Vision and Masters from Georgia Tech. While pursuing his Ph.D. at Georgia Tech and prior to joining Facebook Manohar worked with Dr. Steve Seitz's 3D Maps team at Google Research, Video surveillance group at IBM Watson labs and applied computer vision group at Sarnoff (now Stanford Research Institute).

 

My Research on Human-centered computing

Ramanathan

Dr. Ramanathan Subramanian

National University of Singapore

Date : 26/03/2015

 

Abstract:

Human-centered computing focuses on all aspects of integrating the human/user within the computational loop of Artificial Intelligence (AI) systems. I am currently interested in developing applications that can analyze humans to make informed decisions (Human Behavior Understanding), and interactively employ implicit human signals (eye movements, EEG/MEG responses) for feedback.

In my talk, I will first present my research on analyzing human behavior from free-standing conversational groups, and the associated problems of head pose estimation and F-formation detection from low-resolution images. Then, I will also describe my studies on emotional perception using eye movements and brain signals.

Brief Bio:

Ramanathan Subramanian is a Research Scientist at the Advanced Digital Sciences Center (ADSC). Previously, he has served as a post-doctoral researcher at the Dept. of Information Engineering and Computer Science, University of Trento, Italy and the School of Computing, NUS. His research interests span Human-computer Interaction, Human-behavior understanding, Computer Vision and Computational Social sciences. He is especially interested in studying and modelling various aspects of human visual and emotional perception. His research has contributed to a number of behavioral databases including the prominent NUSEF (eye-fixation), DPOSE (dynamic head pose) and DECAF (Decoding affective multimedia from MEG brain responses) datasets.


 

Towards automatic video production of staged performances

VineethGandhiDr. Vineet Gandhi

University of Grenoble, France

Date : 26/02/2015

 

Abstract:

Professional quality videos of live staged performances are created by recording them from different appropriate viewpoints. These are then edited together to portray an eloquent story replete with the ability to draw out the intended emotion from the viewers. Creating such competent videos typically requires a team of skilled camera operators to capture the scene from multiple viewpoints. In this talk, I will introduce an alternative approach where we automatically compute camera movements in post-production using specially designed computer vision methods.

First, I will explain our novel approach for tracking objects and actors in long video sequences. Second, I will describe how the actor tracks can be used for automatically generating multiple clips suitable for video editing by simulating pan-tilt-zoom camera movements within the frame of a single high resolution static camera. I will conclude my talk by presenting test and validation results on a challenging corpus of theatre recordings and demonstrating how the proposed methods open the way to novel applications for cost effective video production of live performances including, but not restricted to, theatre, music and opera.

Brief Bio:

Vineet Gandhi obtained his B.Tech degree in Electronics and Communication engineering from Indian Institute of Information Technology, Design and Manufacturing Jabalpur and a master degree under prestigious Erasmus Mundus Scholarship program. During his masters he studied in three different countries for a semester each specializing in the areas of Optics, image and vision. He did his master thesis in perception team at INRIA France in collaboration with Samsung research. He later joined Imagine team at INRIA as a doctoral researcher and obtained his PhD degree in computer science and applied mathematics.

His current research interests are in the areas of visual learning/detection/recognition, computational photography/videography and sensor fusion for 3D reconstruction. He also enjoys working in the field of optics, colorimetry and general mathematics of signal and image processing.


 

The current landscape of deep learning. Trends and Challenges

SoumithChintala

Soumith Chintala

Facebook AI Research

Date : 18/02/2015

 

Abstract:

Recent years have seen rapid growth of the field of Deep Learning. Research in Deep Learning has put forth many new ideas which have led to many path breaking developments in Machine Learning. The large set of tools and techniques that have been churned out of this research are fast being explored and applied to different problems in Computer Vision and NLP. This talk shall focus on some of the current trends and practices in Deep Learning research and the challenges that lie ahead.

 

 

Extreme Classification: A New Paradigm for Ranking & Recommendation

ManikVarmaManik Varma

Microsoft Research India

Date : 10/02/2015

 

Abstract:

The objective in extreme multi-label classification is to learn a classifier that can automatically tag a data point with the most relevant subset of labels from a large label set. Extreme multi-label classification is an important research problem since not only does it enable the tackling of applications with many labels but it also allows the reformulation of ranking and recommendation problems with certain advantages over existing formulations.

Our objective, in this talk, is to develop an extreme multi-label classifier that is faster to train and more accurate at prediction than the state-of-the-art Multi-label Random Forest (MLRF) algorithm [Agrawal et al. WWW 13] and the Label Partitioning for Sub-linear Ranking (LPSR) algorithm [Weston et al. ICML 13]. MLRF and LPSR learn a hierarchy to deal with the large number of labels but optimize task independent measures, such as the Gini index or clustering error, in order to learn the hierarchy. Our proposed FastXML algorithm achieves significantly higher accuracies by directly optimizing an nDCG based ranking loss function. We also develop an alternating minimization algorithm for efficiently optimizing the proposed formulation. Experiments reveal that FastXML can be trained on problems with more than a million labels on a standard desktop in eight hours using a single core and in an hour using multiple cores.

Brief Bio:

Manik Varma is a researcher at Microsoft Research India where he helps champion the Machine Learning and Optimization area. Manik received a bachelor's degree in Physics from St. Stephen's College, University of Delhi in 1997 and another one in Computation from the University of Oxford in 2000 on a Rhodes Scholarship. He then stayed on at Oxford on a University Scholarship and obtained a DPhil in Engineering in 2004. Before joining Microsoft Research, he was a Post-Doctoral Fellow at the Mathematical Sciences Research Institute Berkeley. He has been an Adjunct Professor at the Indian Institute of Technology (IIT) Delhi in the Computer Science and Engineering Department since 2009 and jointly in the School of Information Technology since 2011. His research interests lie in the areas of machine learning, computational advertising and computer vision. He has served as an Area Chair for machine learning and computer vision conferences such as ACCV, CVPR, ICCV, ICML and NIPS. He has been awarded the Microsoft Gold Star award and has won the PASCAL VOC Object Detection Challenge.

From 2014
Year wise list: 2024 | 2023 | 2022 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013 |

What Motion Reveals about Shape with Unknown Material Behavior

ManmohanChandrakarManmohan Chandraker

NEC Labs America
Date : 11/12/2014

 

Abstract:

Image formation is an outcome of a complex interaction between object geometry, lighting and camera, as governed by the reflectance of the underlying material. Psychophysical studies show that motion of the object, light source or camera are important cues for shape perception from image sequences. However, due to the complex and often unknown nature of the bidirectional reflectance distribution function (BRDF) that determines material behavior, computer vision algorithms have traditionally relied on simplifying assumptions such as brightness constancy or Lambertian reflectance. We take a step towards overcoming those limitations by answering a fundamental question: what does motion reveal about unknown shape and material? In each case of light source, object or camera motion, we show that physical properties of BRDFs yield PDE invariants that precisely characterize the extent of shape recovery under a given imaging condition. Conventional optical flow, multiview stereo and photometric stereo follow as special cases. This leads to the surprising result that motion can decipher shape even with complex, unknown material behavior and unknown lighting. Further, we show that contrary to intuition, joint recovery of shape, material and lighting using motion cues is often well-posed and tractable, requiring the solution of only sparse linear systems.

At the beginning of the talk, I will also describe our recent work on 3D scene understanding for autonomous driving. Using a single camera, we demonstrate real-time structure from motion performance on par with stereo, on the challenging KITTI benchmark. Combined with top-performing methods in object detection and tracking, we demonstrate 3D object localization with high accuracy comparable to LIDAR. We demonstrate high-level applications such as scene recognition that form the basis for collaborations on collision avoidance and danger prediction with automobile manufacturers.

Brief Bio:

Manmohan Chandraker received a B.Tech. in Electrical Engineering at the Indian Institute of Technology, Bombay and a PhD in Computer Science at the University of California, San Diego. Following a postdoctoral scholarship at the University of California, Berkeley, he joined NEC Labs America in Cupertino, where he conducts research in computer vision. His principal research interests are modern optimization methods for geometric 3D reconstruction, 3D scene understanding and recognition for autonomous driving and shape recovery in the presence of complex illumination and material behavior. His work has received the Marr Prize Honorable Mention for Best Paper at ICCV 2007, the 2009 CSE Dissertation Award for Best Thesis at UC San Diego, a nomination for the 2010 ACM Dissertation Award and the Best Paper Award at CVPR 2014, besides appearing in Best Paper Special Issues of IJCV 2009, IEEE PAMI 2011 and 2014.


 

Scalable Scientific Image Informatics

BSManjunathProf. B.S. Manjunath

University of California
Date : 25/09/2014

 

Abstract:

Recent advances in microscopy imaging, image processing and computing technologies enable large scale scientific experiments that generate not only large collections of images and video, but also pose new computing and information processing challenges. These include providing ubiquitous access to images, videos and metadata resources; creating easily accessible image and video analysis, visualizations and workflows; and publishing both data and analysis resources. Further, contextual metadata, such as experimental conditions in biology, are critical for quantitative analysis. Streamlining collaborative efforts across distributed research teams with online virtual environments will improve scientific productivity, enhance understanding of complex phenomena and allow a growing number of researchers to quantify conditions based on image evidence that so far have remained subjective. This talk will focus on recent work in my group on image segmentation and quantification, followed by a detailed description of the BisQue platform. BisQue (Bio-Image Semantic Query and Environment) is an open-source platform for integrating image collections, metadata, analysis, visualization and database methods for querying and search. We have developed new techniques for managing user-defined datamodels for biological datasets, including experimental protocols, images, and analysis. BisQue is currently used in many laboratories around the world and is integrated into the iPlant cyber-infrastructure (see http://www.iplantcollaborative.org) which serves the plant biology community. For more information on BisQue see http://www.bioimage.ucsb.edu.

Brief Bio:

B. S. Manjunath received the B.E. degree (with distinction) in electronics from Bangalore University, Bangalore, India, in 1985, the M.E. degree (with distinction) in systems science and automation from the Indian Institute of Science, Bangalore, in 1987, and the Ph.D. degree in electrical engineering from University of Southern California, Los Angeles, in 1991. He is a Professor of electrical and computer engineering, Director of the Center for Bio-Image Informatics, and Director of the newly established Center on Multimodal Big Data Science and Healthcare at the University of California, Santa Barbara. His current research interests include image processing, distributed processing in camera networks, data hiding, multimedia databases, and Bio-image informatics. He has published over 250 peer-reviewed articles on these topics and is a co-editor of the book Introduction to MPEG-7 (Wiley, 2002). He was an associate editor of the IEEE Transactions on Image Processing, Pattern Analysis and Machine Intelligence, Multimedia, Information Forensics, IEEE Signal Processing Letters, and is currently an AE for the BMC Bio Informatics Journal. He is a co-author of the paper that won the 2013 Transactions on Multimedia best paper award and is a fellow of the IEEE.


 

Rounding-based Moves for Metric Labeling

PawanKumar

Dr. M. Pawan Kumar

Ecole Centrale Paris
Date : 27/08/2014

 

Abstract:

Metric labeling is an important special case of energy minimization in pairwise graphical models. The dominant methods for metric labeling in the computer vision community belong to the move-making family, due to their computational efficiency. The dominant methods in the computer science community belong to the convex relaxations family, due to their strong theoretical guarantees. In this talk, I will present algorithms that combine the best of both worlds: efficient move-making algorithms that provide the same guarantees as the standard linear programming relaxation.

Brief Bio:

M. Pawan Kumar is an Assistant Professor at Ecole Centrale Paris, and a member of INRIA Saclay. Prior to that, he was a PhD student at Oxford Brookes University (Brookes Vision Group; 2003-2007), a postdoc at Oxford University (Visual Geometry Group; 2008), and a postdoc at Stanford University (Daphne's Approximate Group of Students; 2009-2011).


 

Signal Processing Meets Optics: Theory, Design and Inference for Computational Imaging

Kaushik

Dr. Kaushik Mitra

Rice University
Date : 24/04/2014

 

Abstract:

For centuries, imaging devices have been based on the principle of 'pinhole projection', which directly captures the desired image. However, it has recently been shown that by co-designing the imaging optics and processing algorithms, we can obtain Computational Imaging (CI) systems that far exceed the performance of traditional cameras. Despite the advances made in designing new computational cameras, there are still many open issues such as: 1) lack of a proper theoretical framework for analysis and design of CI cameras, 2) lack of camera designs for capturing the various dimensions of light with high fidelity and 3) lack of proper use of data-driven methods that have shown tremendous success in other domains.

In this talk, I will address the above mentioned issues. First, I will present a comprehensive framework for analysis of computational imaging systems and provide explicit performance guarantees for many CI systems such as light field and extended-depth-of-field cameras. Second, I will show how camera array can be exploited to capture the various dimensions of light such as spectrum and angle. Capturing these dimensions leads to novel imaging capabilities such as post-capture refocussing, hyper-spectral imaging and natural image retouching. Finally, I will talk about how various machine learning techniques such as robust regression and matrix factorization can be used for solving many imaging problems.

Brief Bio:

Kaushik Mitra is currently a postdoctoral research associate in the Electrical and Computer Engineering department of Rice University. His research interests are in computational imaging, computer vision and statistical signal processing. He earned his Ph.D. in Electrical and Computer Engineering from the University of Maryland, College Park, where his research focus was on the development of statistical models and optimization algorithms for computer vision problems


 

Cross-modal Retrieval: Retrieval Across Different Content Modalities

NikhilRasiwasia

Dr. Nikhil Rasiwasia

Yahoo Labs, Bangalore
Date : 01/04/2014

 

Abstract:

In this talk, the problem of cross-modal retrieval from multimedia repositories is considered. This problem addresses the design of retrieval systems that support queries across content modalities, e.g., using an image to search for texts. A mathematical formulation is proposed, equating the design of cross-modal retrieval systems to that of isomorphic feature spaces for different content modalities. Two hypotheses are then investigated, regarding the fundamental attributes of these spaces. The first is that low-level cross-modal correlations should be accounted for. The second is that the space should enable semantic abstraction. Three new solutions to the cross-modal retrieval problem are then derived from these hypotheses: correlation matching (CM), an unsupervised method which models cross-modal correlations, semantic matching (SM), a supervised technique that relies on semantic representation, and semantic correlation matching (SCM), which combines both. An extensive evaluation of retrieval performance is conducted to test the validity of the hypotheses. All approaches are shown successful for text retrieval in response to image queries and vice-versa. It is concluded that both hypotheses hold, in a complementary form, although the evidence in favor of the abstraction hypothesis is stronger than that for correlation.

Brief Bio:

Nikhil Rasiwasia received the B.Tech degree in electrical engineering from Indian Institute of Technology Kanpur (India) in 2005. He received the MS and PhD degrees from the University of California, San Diego in 2007 and 2011 respectively, where he was a graduate student researcher at the Statistical Visual Computing Laboratory, in the ECE department. Currently, he is working as scientist for Yahoo Labs! Bangalore, India. In 2008, he was recognized as an `Emerging Leader in Multimedia' by IBM T. J. Watson Research. He also received the best student paper award at ACM Multimedia conference in 2010. His research interests are in the areas of computer vision and machine learning, in particular applying machine learning solutions to computer vision problems.


 

Lifting 3D Manhattan Lines from a Single Image

Dr. Srikumar Ramalingam

University of California
Date : 10/01/2014

 

Abstract:

In the first part of the talk, I will present a novel and an efficient method for reconstructing the 3D arrangement of lines extracted from a single image, using vanishing points, orthogonal structure, and an optimization procedure that considers all plausible connectivity constraints between lines. Line detection identifies a large number of salient lines that intersect or nearly intersect in an image, but relatively a few of these apparent junctions correspond to real intersections in the 3D scene. We use linear programming (LP) to identify a minimal set of least-violated connectivity constraints that are sufficient to unambiguously reconstruct the 3D lines. In contrast to prior solutions that primarily focused on well-behaved synthetic line drawings with severely restricting assumptions, we develop an algorithm that can work on real images. The algorithm produces line reconstruction by identifying 95% correct connectivity constraints in York Urban database, with a total computation time of 1 second per image.

In the second part of the talk, I will briefly mention about my other work in graphical models, robotics, geo-localization, generic camera modeling and 3D reconstruction.

Brief Bio:

Srikumar Ramalingam is a Principal Research Scientist at Mitsubishi Electric Research Lab (MERL). He received his B.E from Anna University (Guindy) in India and his M.S from University of California (Santa Cruz) in USA. He received a Marie Curie Fellowship from European Union to pursue his studies at INRIA Rhone Alpes (France) and he obtained his PhD in 2007. His thesis on generic imaging models received INPG best thesis prize and AFRIF thesis prize (honorable mention) from the French Association for Pattern Recognition. After his PhD, he spent two years in Oxford working as a research associate in Oxford Brookes University, while being an associate member in visual geometry group in Oxford University. He has published more than 30 papers in flagship conferences such as CVPR, ICCV, SIGGRAPH ASIA and ECCV. He has co-edited journals, coauthored books, given tutorials and organized workshops on topics such as multi-view geometry and discrete optimization. His research interests are in computer vision, machine learning and robotics problems.

From 2013
Year wise list: 2024 | 2023 | 2022 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013 |

Higher-Order Grouping on the Grassmann Manifold

VenuMadhavGovindu

Venu Madhav Govindu

University of Grenoble, France
Date : 15/11/2013

 

Abstract:

The higher-order clustering problem arises when data is drawn from multiple subspaces or when observations fit a higher-order parametric model. In my talk I will present a tensor decomposition solution to this problem and its refinement based on estimation on a Grassmann manifold. This method exploits recent advances in online estimation on the Grassmann manifold and is resultantly efficient and scalable with a low memory requirement. I will present results of this method applied to a variety of segmentation problems including planar segmentation of Kinect depth maps and motion segmentation of the Hopkins 155 dataset for which we achieve performance comparable to the state-of-the-art.

Brief Bio:

Venu Madhav Govindu is with the Department of Electrical Engineering, Indian Institute of Science, Bengaluru. His research interests pertain to geometric and statistical estimation in computer vision.

  • Start
  • Prev
  • 1
  • 2
  • Next
  • End
  1. You are here:  
  2. Home
  3. Events
  4. Talks and Visits
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.