Bio:
Rajat Talak is a Research Scientist in the Department of Aeronautics and Astronautics at the Massachusetts Institute of Technology (MIT). Prior to this, he was a Postdoctoral Associate in the Department of Aeronautics and Astronautics, at MIT, from 2020-2022. He received his Ph.D. from the Laboratory of Information and Decision Systems at MIT, in 2020. He holds a Master of Science degree from the Dept. of Electrical Communication Engineering, Indian Institute of Science, Bangalore, India, and a B.Tech from the Dept. of Electronics and Communication Engineering, National Institute of Technology -Surathkal, India. He is the recipient of the ACM MobiHoc 2018 Best Paper Award and Gold medal for his MSc thesis from the Indian Institute of Science.
Finding Patterns in Pictures
Prof. B.S. Manjunath
Date : 09/02/2024
Abstract:
Explore the evolving landscape of computer vision as Prof. B.S. Manjunath delves into its broader applications beyond everyday object recognition. The discussion will span diverse fields such as neuroscience, materials science, ecology, and conservation. Prof. Manjunath will showcase innovative applications, including a novel approach to malware detection and classification. The talk will conclude with an overview of a unique platform designed to foster reproducible computer vision and encourage interdisciplinary collaborations between vision/ML researchers and domain scientists through the sharing of methods and data.
Bio:
Prof. B.S. Manjunath, is a distinguished academician and Chair of the ECE Department at UCSB, California. He directs the Center for Multimodal Big Data Science and Healthcare. He has published over 300 peer-reviewed articles in various journals and peer reviewed conferences and his publications have been cited extensively. He is an inventor on 24 Patents and co-edited the first book on the ISO/MPEG-7 multimedia content representation standard. He directed the NSF/ITR supported center on Bio-Image Informatics. His team is developing the open-source BisQue image informatics platform that helps users manage, annotate, analyze and share their multimodal images in a scalable manner with a focus on reproducible science.
Domain Adaptation for Fair and Robust Computer Vision
Tarun Kalluri
Date : 09/01/2024
Abstract:
While recent progress significantly advances the state of the art in computer vision across several tasks, the poor ability of these models to generalize to domains and categories under-represented in the training set remains a problem, posing a direct challenge to fair and inclusive computer vision. In my talk, I will talk about my recent efforts towards improving generalizability and robustness in computer vision using domain adaptation. First, I will talk about our work on scaling domain adaptation to large scale datasets using metric learning. Next, I will introduce our new dataset effort called GeoNet aimed at benchmarking and developing novel algorithms towards geographical robustness in various vision tasks. Finally, I will talk about some research directions for the future in terms of leveraging rich multimodal (vision, language) data to improve adaptation of visual models to new domains.
Bio:
Tarun Kalluri is a fourth year PhD student at UC San Diego in the Visual Computing Group. Prior to that, he graduated with a bachelors from Indian Institute of Technology, Guwahati and worked as a data scientist in Oracle. His research interests lie in label and data efficient learning from images and videos, domain adaptation and improving fairness in AI. He is a recipient of IPE PhD fellowship for 2020-21.
What Do Generative Image Models Know? Understanding their Latent Grasp of Reality and Limitations
Anand Bhattad
Date : 06/01/2024
Abstract:
Recent generative image models like StyleGAN, Stable Diffusion, and VQGANs have achieved remarkable results, producing images that closely mimic real-world scenes. However, their true understanding of visual information remains unclear. My research delves into this question, exploring whether these models operate on purely abstract representations or possess an understanding akin to traditional rendering principles. My findings reveal that these models:
Encode a wide range of scene attributes like lighting, albedo, depth, and normals, suggesting a nuanced understanding of image composition and structure. Despite this, they exhibit significant shortcomings in depicting projective geometry and shadow consistency, often misrepresenting relationships and light interactions, leading to subtle visual anomalies.
This talk aims to illuminate the complex narrative of what generative image models truly understand about the visual world, highlighting their capabilities and limitations. I’ll conclude with insights into the broader implications of these findings and discuss potential directions for enhancing the realism and utility of generative imagery in applications demanding high fidelity and physical plausibility.
Bio:
Anand Bhattad is a Research Assistant Professor at the Toyota Technological Institute in Chicago (TTIC). Before this role, he earned his PhD from the University of Illinois Urbana-Champaign (UIUC), working with his advisor David Forsyth. His primary research interests lie in computer vision, with a specific focus on knowledge in generative models, and their applications to computer vision, computer graphics, and computational photography. His recent work, DIVeR, received a best paper nomination at CVPR 2022. He was listed as an Outstanding Reviewer at ICCV 2023 and an Outstanding Emergency Reviewer at CVPR 2021.
Object-centric 3D Scene Understanding from Videos
Yash Bhalgat
Date : 06/01/2024
Abstract:
The growing demand for immersive, interactive experiences has underscored the importance of 3D data in understanding our surroundings. Traditional methods for capturing 3D data are often complex and equipment-intensive. In contrast, Yash Bhalgat's research aims to utilize unconstrained videos, such as those from augmented reality glasses, to effortlessly capture scenes and objects in their full 3D complexity.
Yash Bhalgat's talk at IIIT- H on January 6, 2024, consisted of describing a method to incorporate Epipolar Geometry priors in multi-view Transformer models to enable identifying objects across extreme pose variations. Next, he talked about his recent work on 3D object segmentation using 2D pre-trained foundation models. Finally, he touched upon his ongoing work on object-centric dynamic scene representations.
Bio:
Yash Bhalgat is a 3rd year PhD student at the University of Oxford's Visual Geometry Group supervised by Andrew Zisserman, Andrea Vedaldi, Joao Henriques and Iro Laina. His research is broadly in 3D computer vision and machine learning, with a specific focus on geometry-aware deep networks (transformers), 3D reconstruction, and neural rendering. Yash also works on the intersection of 3D and LLMs. He was a Senior Researcher at Qualcomm AI Research working on efficient deep learning previously. Yash received Masters in Computer Science from the University of Michigan - Ann Arbor, and Bachelors in Electrical Engineering (with CS minor) from IIT Bombay.
3D Representation and Analytics for Computer Vision
Prof. Chandra Kambhamettu
Date : 09/01/2024
Abstract:
Data in computer vision is commonly either in the homogeneous format in 2D projective space (e.g., images and videos) or a heterogeneous format in 3D Euclidean space (e.g., point clouds). One advantage of 3D data is its invariancy towards illumination-based appearance. 3D point clouds are now a vital data source for vision tasks such as autonomous driving, robotic perception, and scene understanding. Thus, deep learning for 3D points has become an essential branch of geometry-based research. However, deep learning over unstructured point clouds is quite challenging. Moreover, due to the 3D data explosion, new representations are necessary for compression and encryption. Therefore, we introduce a new sphere-based representation (3DSaint) to model a 3D scene and further utilize it in deep networks. Our representation produces state-of-the-art results on several 3D understanding tasks, including 3D shape classification. We also present differential geometry-based networks for the 3D analysis tasks.
Bio:
Dr. Chandra Kambhamettu is a Full Professor of the Computer Science department at the University of Delaware, where he directs the Video/Image Modelling and Synthesis (VIMS) group. His research interests span Artificial Intelligence and Data Science, including computer vision, machine learning, and big data visual analytics. His Lab focuses on novel schemes for multimodal image analysis methodologies. Some of his recent research includes image analysis for the visually impaired, 3D point cloud analysis, drone and vehicular-based camera imagery acquisition, analysis and reconstruction, and plant science image analysis. His work on nonrigid structure from motion was published in CVPR in 2000 and cited as one of the first two papers in the field of Nonrigid Structure from Motion. Several of Dr. Kambhamettu’s works also focus on problems that highly impact earth life, such as arctic sea ice observations with application towards mammal habitat quantification and climate change and hurricane image studies, among several others. Before joining UD, he was a research scientist at NASA-Goddard, where he received the “1995 Excellence in Research Award.” In addition, he received the NSF CAREER award in 2000 and NASA’s “Group Achievement Award” in 2013 for his work in the deployment of the Arctic Collaborative Environment.