Object-centric 3D Scene Understanding from Videos
Abstract:
The growing demand for immersive, interactive experiences has underscored the importance of 3D data in understanding our surroundings. Traditional methods for capturing 3D data are often complex and equipment-intensive. In contrast, Yash Bhalgat's research aims to utilize unconstrained videos, such as those from augmented reality glasses, to effortlessly capture scenes and objects in their full 3D complexity.
Yash Bhalgat's talk at IIIT- H on January 6, 2024, consisted of describing a method to incorporate Epipolar Geometry priors in multi-view Transformer models to enable identifying objects across extreme pose variations. Next, he talked about his recent work on 3D object segmentation using 2D pre-trained foundation models. Finally, he touched upon his ongoing work on object-centric dynamic scene representations.
Bio:
Yash Bhalgat is a 3rd year PhD student at the University of Oxford's Visual Geometry Group supervised by Andrew Zisserman, Andrea Vedaldi, Joao Henriques and Iro Laina. His research is broadly in 3D computer vision and machine learning, with a specific focus on geometry-aware deep networks (transformers), 3D reconstruction, and neural rendering. Yash also works on the intersection of 3D and LLMs. He was a Senior Researcher at Qualcomm AI Research working on efficient deep learning previously. Yash received Masters in Computer Science from the University of Michigan - Ann Arbor, and Bachelors in Electrical Engineering (with CS minor) from IIT Bombay.
