Understanding Deep Image Representations by Inverting Them

 

Abstract:

Image representations, from SIFT and Bag of Visual Words to Convolutional Neural Networks (CNNs), are a crucial component of almost any image understanding system. Nevertheless, our understanding of them remains limited. In this talk I'll discuss our experiments on the visual information contained in representations by asking the following question: given an encoding of an image, to which extent is it possible to reconstruct the image itself? To answer this question we contribute a general framework to invert representations. We show that this method can invert representations such as HOG and SIFT more accurately than recent alternatives while being applicable to CNNs too. We then use this technique to study the inverse of recent state-of-the-art CNN image representations for the first time. Among our findings, we show that several layers in CNNs retain photographically accurate information about the image, with different degrees of geometric and photometric invariance.

Brief Bio:

Aravindh did his undergraduate in CSE at IIIT Hyderabad from 2008 to 2012. He worked in the Cognitive Science lab and Robotics lab as part of his undergraduate research (B.Tech honors). He completed an MSc in Robotics at Carnegie Mellon University and is currently reading for a D.Phil in Engineering Science at the University of Oxford, Visual Geometry Group with Prof. Andrea Vedaldi.