CVIT Makes a Mark at ICVGIP 2025

Details: By Ann Jacob; 30.Jan

Award Giving

CVIT Makes a Mark at ICVGIP 2025

December 2025

ICVGIP 2025, hosted this year at IIT Mandi with support from IUPRAI, turned into a strong showcase of IIIT Hyderabad’s research depth and community engagement — led prominently by the Centre for Visual Information Technology (CVIT).

Shaping the Conference

Even though the event was anchored by IIT Mandi, CVIT members held meaningful roles at different levels of the conference structure:

Prof. C.V. Jawahar, Head of CVIT – guiding expert voice and advisory contributor
Dr. Ajoy Mondal – active member of the organizing/computing committees and workshop leader
Dr. Gurupreet Singh Lehal – contributor to academic program activities
Dr. Ravi Kiran Sarvadevabhatla – researcher, presenter, and session facilitator

Together with faculty from IITs and CVIT, IIIT Hyderabad, strengthened the scientific and academic foundation of the event.

Eleven Research Papers Across Core Computer Vision Themes

IIIT Hyderabad had 11 papers accepted and presented, reinforcing the institute’s research leadership in India.

The contributions spanned:

OCR and handwritten text recognition for Indian languages
Semi-supervised learning and domain transfer
Evaluation of multimodal LLMs for assistive AI
Cultural–context datasets involving food and visual understanding

Beyond the numbers, these papers reflect CVIT’s dedication to problems rooted in India’s linguistic and cultural reality — where research outcomes can make real-world impact

Plenary Talk by Prof. C. V. Jawahar, Head of the Centre for Visual Information Technology-CVIT

Title: Anticipating the Road Ahead in a Landscape of Disrupted Paths

Camera-based perception has advanced dramatically, with modern deep neural models achieving impressive results across diverse and challenging road environments worldwide. Yet a fundamental question remains open: how do we anticipate what humans (drivers, riders, or pedestrians) intend to do, especially on the unstructured, heterogeneous roads that characterize countries like India? Understanding the unfolding dynamics of human actions from video requires models that go beyond classical computer vision and scene understanding.

Computer vision itself is in a moment of transition. Decades of geometric and photometric reasoning have given way to data-driven learning at scale. This shift prompts deeper reflection: what foundational ideas from classical vision remain essential? How do we cast and solve problems? How do we balance between the novelty of the algorithms, incremental improvements over the state of the art and immediate impact? This talk distils experiences from working on perception and intention-prediction models in Indian traffic—settings that expose the limitations of existing approaches and open new research questions.

Community Touchpoints: Poster, Stall, and Student Interaction

A poster presentation showcased ongoing breakthroughs from student researchers
A technology stall offered hands-on interaction with tools, demos, and datasets being built at IIIT Hyderabad

These touchpoints sparked conversations, mentorship moments, and collaborative curiosity from researchers, students, and industry representatives.

Leading Conversations: Workshop on Indian Language OCR

CVIT researchers co-organised a focused workshop on Indian Language OCR: Current Status, Challenges, and Future Directions at ICVGIP 2025. This event served as a platform to:

Bring together research on printed, handwritten, and scene text OCR for Indian languages.
Present advances, datasets, tools, and applications to a broader audience.
Identify open challenges and future research trajectories, especially for Indian scripts and multilingual AI tasks.

This workshop addresses the challenges and opportunities in developing Optical Character Recognition (OCR) systems for Indian languages, covering printed documents, handwritten manuscripts, and scene text. India’s rich linguistic diversity—spanning multiple scripts, writing styles, and document formats—poses unique research challenges that demand innovative and scalable solutions. The workshop presents recent advances in OCR methodologies, datasets, tools, and real-world applications, while also highlighting open problems and outlining future research directions. Robust multilingual OCR is a cornerstone of national digitization efforts in governance, education, and industry, and serves as a critical enabler for the broader success of Artificial Intelligence (AI) systems and Large Language Models (LLMs) in the Indian context.

Organisers from CVIT – IIITH included:

C.V. Jawahar – Senior academic and adviser, shaping workshop content and research focus.
Ajoy Mondal – played a key role in workshop leadership and student engagements.
Gurupreet Singh Lehal and Ravi Kiran Sarvadevabhatla – supported workshop structuring and domain engagement.

This workshop actively connected researchers and students, fostering collaboration toward building robust, inclusive OCR solutions for Indian contexts.

Keynote Speaker: Dr. Ravi Kiran Sarvadevabhatla, Associate Professor at IIIT Hyderabad

Abstract of Talk:

BharatGen is a government-funded, mission-mode initiative focused on building sovereign, multimodal, and multilingual foundation models for India. This talk will provide a brief overview of the BharatGen initiative and present recent advances in document foundation models, including the open-source release of Patram-7B, India’s first vision–language document foundation model.

The talk outlined the capabilities enabled by such models for layout-aware and multilingual document understanding, and will describe the supporting tools developed for evaluation, visualization, and systematic error analysis. It also further discussed real-world applications currently under development and share insights gained from building practical vision–language systems, including scenarios where inputs extend beyond traditional document formats.

The keynote concluded by highlighting the transformative capabilities unlocked by document foundation models and by identifying open research challenges and future directions in multilingual and multimodal document intelligence.

Support and Sponsorship

Adding to institutional presence, iHub-Data — the applied research and translational innovation centre at IIIT Hyderabad — served as a Silver Sponsor, further cementing the institute’s commitment to strengthening India’s vision community.

A Remarkable Footprint

With 11 accepted papers, active workshop leadership, visible student-driven interactions, and sponsorship support, CVIT and IIIT Hyderabad emerged as one of the strongest contributors to ICVGIP 2025 — driving research, capacity building, and collaboration for the broader Indian vision ecosystem.