ICVGIP 2025

Indian Language OCR: Current Status, Challenges, and Future Directions

In Conjunction with ICVGIP 2025

Dec 17, 2025 , IIT Mandi

Get Started

Topic of the Workshop

This workshop will address the challenges and opportunities in developing Optical Character Recognition (OCR) systems for Indian languages, encompassing printed documents, handwritten manuscripts, and scene text. The diversity of scripts, writing styles, and document formats in India presents unique research problems that require innovative approaches. The workshop will present recent advances, datasets, tools, and applications, while also identifying open challenges and future research directions. Robust multilingual OCR is central to ongoing digitization efforts in governance, education, and industry, and is a critical enabler for the broader success of Artificial Intelligence (AI) and Large Language Models (LLMs) in the Indian context. - you can add this in the home page.

Keynote Speakers

Soumyadeep Dey
Microsoft R&D

Bio of Speaker :

Soumyadeep Dey is currently working as Sr. Applied Scientist at Microsoft, India. He completed his Ph.D. from the Department of Computer Science & Engineering at IIT Kharagpur in 2019. His research interests include image processing, computer vision, document segmentation, and applications of machine learning techniques for resource-constrained devices.

Abstract :

Smartphones have become ubiquitous computing devices due to their mobility and ease of use. However, they present unique challenges for AI/ML because of limited processing power, memory, storage, battery life, and varied network connections, as well as data privacy concerns. This talk will discuss several challenges in developing AI/ML models for edge computing, focusing on applications such as document-image cleanup and segmentation.

Ravi Kiran Sarvadevabhatla
IIIT Hyderabad

Bio of Speaker :

Dr. Ravi Kiran Sarvadevabhatla is an Associate Professor at IIIT Hyderabad and a core contributor to national AI initiatives, including leading Vision projects for the Government-funded BharatGen LLM. He holds an MS from the University of Washington, Seattle, and a Ph.D. from IISc (2018), where his thesis won the IUPRAI Best Thesis Award. His research interests lie at the intersection of AI, multi-modal multimedia data, Robotics, and Human-Computer Interaction. (For more details, visit: https://ravika.github.io/)

Abstract :

BharatGen is a government-funded, mission-mode initiative aimed at building sovereign, multimodal, and multilingual foundation models for India. The talk will briefly introduce BharatGen and present work on document foundation models, including the open-source release of Patram-7B, India’s first vision–language document foundation model. It will outline the capabilities enabled by such models for layout-aware and multilingual document understanding, and describe tools developed along the way for evaluation, visualization, and error analysis. The talk will then discuss applications currently being developed. It will also touch upon lessons from building practical vision–language systems, including scenarios where inputs extend beyond documents. The talk will conclude by highlighting capabilities unlocked by document foundation models and open research directions.

Organisers

Ajoy Mondal

IIIT Hyderabad

ajoy.mondal@iiit.ac.in

Anand Mishra

IIT Jodhpur

mishra@iitj.ac.in

Ankur Rana

Punjabi University

arana@pbi.ac.in

Chetan Arora

IIT Delhi

chetan@cse.iitd.ac.in

C.V.Jawahar

IIIT Hyderabad

jawahar@iiit.ac.in

Ganesh Ramakrishnan

IIT Bombay

ganesh@cse.iitb.ac.in

Gurupreet Singh Lehal

IIIT Hyderabad

gs.lehal@research.iiit.ac.in

Ravi Kiran Sarvabhatla

IIIT Hyderabad

ravi.kiran@iiit.ac.in

Tushar Patnaik

CDAC Noida

tusharpatnaik@cdac.in

Venkatapathy Subramanian

IIT Bombay

venkatapathy@cse.iitb.ac.in

Program Schedule

Time	Session	Speakers
9:00 - 9:15	Introduction and Welcome	Prof. Anand Mishra
9:15 - 9:50	Keynote-1 AI/ML for mobile devices @ Microsoft-IDC	Soumyadeep Dey Session Chair: Anand Mishra
9:50 - 10:50	Student Presentations Session-1, Session Chair: Ajoy Mondal (4 presentations, 15 min each)
10:50 - 11:00	Poster Lightning Talks (5 presentations, 2 minutes each) Session Chair: Anand Mishra
11:00 - 12:00	Coffee Break (with Poster/Demo Displays)
12:00 - 12:35	Keynote-2 Document Foundation Models and Applications at BharatGen	Ravi Kiran S Session Chair: Chetan Arora
12:35 - 12:55	Student Presentations Session-2, Session Chair: Ajoy Mondal (2 presentations, 10 min each)
12:55 - 1:00	Concluding Remarks	Prof. C. V. Jawahar

Student Presentation Details

S. No	Presenter	Title	Track
Student Presentation - Session 1 : Session Chair - Dr. Ajoy Mondal (Duration: 15 min / presentation)
1	Anik De	IndicPhotoOCR	Oral, Demo
2	Dikshant Sharma	X-STAR: Cross-lingual Scene Text-aware Image Retrieval	Oral, Demo
3	Evani Lalitha	SemiHastakshar: Generalizable Indic Handwritten OCR through Semi-Supervised Learning	Oral , Poster
4	Sahithi Kukkala	IndicDLP : A Foundational Dataset for Multi-Lingual and Multi-Domain Document Layout Parsing	Oral
Poster Lightning Talks ( Duration : 2 min / presentations)
1	Radha Krishna Deshpande	Intelligent Search Engine	Poster, Demo
2	Shashank Krishna Vempati	Lipikar: A Unified Multilingual OCR Ecosystem for 22+ Indian Languages	Demo
3	Pratyush Jena	Unveiling Text in Challenging Stone Inscriptions: A Character-Context-Aware Patching Strategy for Binarization	Poster
4	K Lokesh	Recognition and Restoration of Historical Manuscript	Poster
5	Shaon Bhattacharyya	FormLens: From Ink to Insight with Adapting Vision-Language Models for Handwritten Form Digitization	Poster, Demo
Student Presentation - Session 2: Session Chair - Dr. Ajoy Mondal (Duration: 10 minutes each paper)
1	Saumya Mundra	MIST: Multilingual Incidental Dataset for Scene Text Detection	Oral
2	Avijit Dasgupta	Are We There Yet? Exploring the Capabilities of MLLMs in Assistive AI Applications	Oral