Computer vision - Self-Supervised Representation Learning

 

Abstract:

Yash Patel during his presentation, discussed a range of topics related to his research interests, which primarily include Self-Supervised Representation Learning, Image Compression, Scene Text Detection and Recognition, Tracking and Segmentation in Videos, and 3D Reconstruction.
Specifically, he discussed the challenges associated with Training Neural Networks on Non-Differentiable Losses and presented various approaches for overcoming these challenges. In particular, he presented his proposed technique for training a neural network by minimizing a surrogate loss that approximates a target evaluation metric that may be non-differentiable. To achieve this, the surrogate is learned via a deep embedding method where the Euclidean distance between the prediction and the ground truth corresponds to the value of the evaluation metric. Additionally, he described his work on proposing a differentiable surrogate loss for the recall metric. To enable training with a very large batch size, which is crucial for metrics computed on the entire retrieval database, the speaker utilized an implementation that sidesteps the hardware constraints of the GPU memory. Furthermore, an efficient mixup approach that operates on pairwise scalar similarities was employed to virtually increase the batch size further.

Bio:

Yash Patel, a Ph.D. candidate at the Center for Machine Perception, Czech Technical University, advised by Prof. Jiri Matas and an esteemed alumnus of IIIT Hyderabad, visited our institute on 16th March 2023.

He holds a Bachelor in Technology with Honors by Research in Computer Science and Engineering from International Institute of Information Technology, Hyderabad (IIIT-H). During my undergrad, I was working with Prof. C.V. Jawahar at the Center for Visual Information Technology (CVIT). He also holds a Master's degree in Computer Vision from the Robotics Institute of Carnegie Mellon University, where he worked with Prof. Abhinav Gupta.

Yash's research interests are primarily focused on computer vision with expertise in areas such as self-supervised representation learning, image compression, scene text detection and recognition, tracking and segmentation in videos, and 3D reconstruction.