Year wise list:   2022 | 2020 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013 |

Title Heading


Yash Patel


Date : 16/03/2023



Yash Patel during his presentation, discussed a range of topics related to his research interests, which primarily include Self-Supervised Representation Learning, Image Compression, Scene Text Detection and Recognition, Tracking and Segmentation in Videos, and 3D Reconstruction.
Specifically, he discussed the challenges associated with Training Neural Networks on Non-Differentiable Losses and presented various approaches for overcoming these challenges. In particular, he presented his proposed technique for training a neural network by minimizing a surrogate loss that approximates a target evaluation metric that may be non-differentiable. To achieve this, the surrogate is learned via a deep embedding method where the Euclidean distance between the prediction and the ground truth corresponds to the value of the evaluation metric. Additionally, he described his work on proposing a differentiable surrogate loss for the recall metric. To enable training with a very large batch size, which is crucial for metrics computed on the entire retrieval database, the speaker utilized an implementation that sidesteps the hardware constraints of the GPU memory. Furthermore, an efficient mixup approach that operates on pairwise scalar similarities was employed to virtually increase the batch size further.


Yash Patel, a Ph.D. candidate at the Center for Machine Perception, Czech Technical University, advised by Prof. Jiri Matas and an esteemed alumnus of IIIT Hyderabad, visited our institute on 16th March 2023.

He holds a Bachelor in Technology with Honors by Research in Computer Science and Engineering from International Institute of Information Technology, Hyderabad (IIIT-H). During my undergrad, I was working with Prof. C.V. Jawahar at the Center for Visual Information Technology (CVIT). He also holds a Master's degree in Computer Vision from the Robotics Institute of Carnegie Mellon University, where he worked with Prof. Abhinav Gupta.

Yash's research interests are primarily focused on computer vision with expertise in areas such as self-supervised representation learning, image compression, scene text detection and recognition, tracking and segmentation in videos, and 3D reconstruction.

Building Maximal Vision Systems with Minimal Resources


Dr. Ayush Bansal


Date : 19/12/2022



Current vision and robotic systems are like mainframe machines of the 60s -- they require extensive resources: (1) dense data capture and massive human annotations, (2) large parametric models, and (3) intensive computational infrastructure. I build systems that can learn directly from sparse and unconstrained real-world samples with minimal resources, i.e., limited or no supervision, use simple and efficient models, and operate on every day computational devices. Building systems with minimal resources allows us to democratize them for non-experts. My work has impacted important areas such as virtual reality, content creation and audio-visual editing, and providing a natural voice to speech-impaired individuals.

In my talk, I will present my efforts to build vision systems for novel view synthesis. I will discuss Neural Pixel Composition, a novel approach for continuous 3D-4D view synthesis that reliably operates on sparse and wide-baseline multi-view images/videos and can be trained efficiently within a few minutes for high-resolution (12MP) content using 1 GB GPU memory. I will present my efforts to build vision systems for unsupervised audio-visual synthesis. I will primarily discuss Exemplar Autoencoders that enable zero-shot audio-visual retargeting. Exemplar Autoencoders are built on remarkably simple insights: (1) autoencoders project out-of-sample data onto the distribution of the training set; and (2) exemplar learning enables us to capture the voice, stylistic prosody (emotions and ambiance), and visual appearance of the target. These properties enable an autoencoder trained on an individual's voice to generalize for unknown voices in different languages. Exemplar Autoencoders can synthesize natural voices for speech-impaired individuals and do a zero-shot multilingual translation.


Aayush Bansal is currently a short-term research scientist at the Reality Labs Research of Meta Platforms, Inc. He received his Ph.D. in Robotics from Carnegie Mellon University under the supervision of Prof. Deva Ramanan and Prof. Yaser Sheikh. He was a Presidential Fellow at CMU, and a recipient of the Uber Presidential Fellowship (2016-17), Qualcomm Fellowship (2017-18), and Snap Fellowship (2019-20). His research has been covered by various national and international media such as NBC, CBS, WQED, 90.5 WESA FM, France TV, and Journalist. He has also worked with production houses such as BBC Studios, Full Frontal with Samantha Bee (TBS), etc. More details are available on his webpage:

Towards Autonomous Driving in Dense, Heterogeneous, and Unstructured Environments


Dr. Rohan Chandra


Date : 07/11/2022



In this talk, I discuss many key problems in autonomous driving towards handling dense, heterogeneous, and unstructured traffic environments. Autonomous vehicles (AV) at present are restricted to operating on smooth and well-marked roads, in sparse traffic, and among well-behaved drivers. I present new techniques to perceive, predict, and navigate among human drivers in traffic that is significantly denser in terms of a number of traffic-agents, more heterogeneous in terms of size and dynamic constraints of traffic agents, and where many drivers may not follow the traffic rules and have varying behaviors. My talk is structured along three themes—perception, driver behavior modeling, and planning. More specifically, I will talk about Improved tracking and trajectory prediction algorithms for dense and heterogeneous traffic using a combination of computer vision and deep learning techniques. A novel behavior modeling approach using graph theory for characterizing human drivers as aggressive or conservative from their trajectories. Behavior-driven planning and navigation algorithms in mixed and unstructured traffic environments using game theory and risk-aware planning. Finally, I will conclude by discussing the future implications and broader applications of these ideas in the context of social robotics where robots are deployed in warehouses, restaurants, hospitals, and inside homes to assist human beings.


Rohan Chandra is currently a postdoctoral researcher at the University of Texas, Austin, hosted by Dr. Joydeep Biswas. Rohan obtained his B.Tech from the Delhi Technological University, New Delhi in 2016 and completed his MS and PhD in 2018 and 2022 from the University of Maryland advised by Dr. Dinesh Manocha. His doctoral thesis focused on autonomous driving in dense, heterogeneous, and unstructured traffic environments. He is a UMD’20 Future Faculty Fellow, RSS’22 Pioneer, and a recipient of a UMD’20 summer research fellowship. He has published his work in top computer vision and robotics conferences (CVPR, ICRA, IROS) and has interned at NVIDIA in the autonomous driving team. He has served on the program committee of leading conferences in robotics, computer vision, artificial intelligence, and machine learning.