Workshop on Multilingual Document Digitization for Indian Languages: Standards, Tools, and Trends

In Conjunction with 11th International Conference on Pattern Recognition and Machine Intelligence (PReMI'25)

December 11 , 2025

IIT Delhi

MDDIL’25 Workshop

India’s linguistic mosaic includes 22 official languages, many scripts, and myriad dialects, making digital inclusion a major challenge. Most online content remains English-centric, limiting access for much of the population. There is a growing demand to digitize printed materials such as manuscripts, government records, legal documents, textbooks, and archives into searchable, machine-readable formats. Optical Character Recognition (OCR) is central to digitization, but Indian languages present challenges like diverse writing systems, fonts, historical artifacts, and varied layouts. AI advances are promising, yet widespread adoption needs standardized APIs, tools, and datasets to achieve interoperability and scalability.

Key Topics

  • Explore the in OCR & Document AI for Indian languages with the OCR Consortium.
  • Define API standards and engineering best practices for seamless interoperability.
  • Tackle challenges in text representation, layout analysis, and dataset curation.
  • Discover how AI drives digitization of classical manuscripts.
  • Engage with experts and policymakers shaping the future of digital inclusion.
  • Chart actionable pathways for research, policy, and large-scale deployment.

About Us

We are a consortium of six leading institutions—IIIT Hyderabad, IIT Delhi, IIT Bombay, IIT Jodhpur, Punjabi University, and CDAC Noida—working under the National Language Translation Mission (Bhashini) funded by MeitY. Our team is developing advanced OCR technology for Indian languages across three modalities: printed, handwritten, and scene text. Together, we aim to make digital content accessible in all Indian languages through research, innovation, and national collaboration . For more details visit : ilocr.iiit.ac.in

Speakers

Plenary Session Speakers

Shri Amitabh Nag

Shri Amitabh Nag

CEO, DIBD
Shri. Vijay Kumar

Shri. Vijay Kumar

MeiTY , Govt of India

Invited Speakers

Lt.Col. Anant Sinha

Lt.Col. Anant Sinha

Administrator,
The Asiatic Society , Kolkata
Prof. M Balakrishnan

Prof. M Balakrishnan

Distinguished Visiting Professor at Plaksha University, Honorary Professor at CSE IIT Delhi, Founder ASSISTECH, SIT IIT Delhi
Prof. Ramesh C. Gaur

Prof. Ramesh C. Gaur

Professor & Dean,
Indira Gandhi National Centre for the Arts
Prof. Santanu Chaudhury

Prof. Santanu Chaudhury

IIT Delhi

Industry Speakers

Atul Rai

Atul Rai

CEO and Co-Founder
Staqu Technologies

Bhaskar Arun

Lead Data Scientist
Tata 1mg
Nitin Singh

Nitin Singh

Head of Rekhta Labs
R&D Wing Rektha Foundation

Program Schedule


Time Session Speaker
11:00 - 12:00 Consortium Presentations Dr. Ankur Rana ( PU)
Anike De ( IITJ)
Bhupendra Kumar ( CDAC Noida)
Dr. Venkatapathy. S (IITB)
12:00 - 13:00 Industry Talk
12:00-12:20 Title : TBD Nitin Singh
12:20-12:40 Title : TBD Atul Rai
12:40-13:00 FocusFormer to FocusVLM: Building Specialized, Efficient, and Privacy-Preserving OCR Systems for Real-World Healthcare Documents Bhaskar Arun
13:00-14:00 Lunch Break
14:00- 14:40 Plenary Sessions
14:00-14:20 Keynote 1 Sri Amitabh Nag
14:20-14:40 Keynote 2 Sri Vijay Kumar
14:40-15:00 Standards & Interoperability in Indic OCR: APIs, Tools, Best Practices Prof. Ravi Kiran S
Krishna Tulsyan
Venakatapathy S
15:00-16:20 Invited Talks
15:00-15:20 Title : TBD Prof. Ramesh C.Gaur
15:20-15:40 Title : TBD Prof. Santanu Chaudhury
15:40-16:00 Heritage Meets Technology Lt. Col. Anant Sinha
16:00-16:20 Challenges in accessing STEM documents by the visually impaired Prof. M. Balakrishnan
16:20-16:30 Closing Remarks Prof. C.V.Jawahar

Organizers

Ajoy Mondal

IIIT Hyderabad

ajoy.mondal@iiit.ac.in

Anand Mishra

IIT Jodhpur

mishra@iitj.ac.in

Ankur Rana

Punjabi University

arana@pbi.ac.in

Chetan Arora

IIT Delhi

chetan@cse.iitd.ac.in

C.V.Jawahar

IIIT Hyderabad

jawahar@iiit.ac.in


Ganesh Ramakrishnan

IIT Bombay

ganesh@cse.iitb.ac.in

Gurupreet Singh Lehal

IIIT Hyderabad

gs.lehal@research.iiit.ac.in

Ravi Kiran Sarvabhatla

IIIT Hyderabad

ravi.kiran@iiit.ac.in

Tushar Patnaik

CDAC Noida

tusharpatnaik@cdac.in

Venkatapathy Subramanian

IIT Bombay

venkatapathy@cse.iitb.ac.in