Workshop on Multilingual Document Digitization for Indian Languages: Standards, Tools, and Trends.

In Conjunction with 11th International Conference on Pattern Recognition and Machine Intelligence (PReMI'25) .

December 11 , 2025.

IIT Delhi

MDDIL’25 Workshop

India’s linguistic mosaic includes 22 official languages, many scripts, and myriad dialects, making digital inclusion a major challenge. Most online content remains English-centric, limiting access for much of the population. There is a growing demand to digitize printed materials such as manuscripts, government records, legal documents, textbooks, and archives into searchable, machine-readable formats. Optical Character Recognition (OCR) is central to digitization, but Indian languages present challenges like diverse writing systems, fonts, historical artifacts, and varied layouts. AI advances are promising, yet widespread adoption needs standardized APIs, tools, and datasets to achieve interoperability and scalability.

Key Topics

  • Explore the in OCR & Document AI for Indian languages with the OCR Consortium.
  • Define API standards and engineering best practices for seamless interoperability.
  • Tackle challenges in text representation, layout analysis, and dataset curation.
  • Discover how AI drives digitization of classical manuscripts.
  • Engage with experts and policymakers shaping the future of digital inclusion.
  • Chart actionable pathways for research, policy, and large-scale deployment.

About Us

We are a consortium of six leading institutions—IIIT Hyderabad, IIT Delhi, IIT Bombay, IIT Jodhpur, Punjabi University, and CDAC Noida—working under the National Language Translation Mission (Bhashini) funded by MeitY. Our team is developing advanced OCR technology for Indian languages across three modalities: printed, handwritten, and scene text. Together, we aim to make digital content accessible in all Indian languages through research, innovation, and national collaboration . For more details visit : ilocr.iiit.ac.in

Speakers

TBD

Program ( Tentative )


Time Session Speaker
9:30 to 10:00 Opening & Keynote: OCR for Digital Inclusion TBD
10:00 - 11:15 Standards & Interoperability in Indic OCR: APIs, Tools, Best Practices TBD
11:15-11:30 Tea Break TBD
11:30 - 12:45 OCR Challenges & Opportunities: Layouts, Datasets, Benchmarks, Future Directions TBD
12:45 - 13:00 Closing & Roadmap Suggestions -

Organizers

Ajoy Mondal

IIIT Hyderabad

ajoy.mondal@iiit.ac.in

Anand Mishra

IIT Jodhpur

mishra@iitj.ac.in

Ankur Rana

Punjabi University

arana@pbi.ac.in

Chetan Arora

IIT Delhi

chetan@cse.iitd.ac.in

C.V.Jawahar

IIIT Hyderabad

jawahar@iiit.ac.in


Ganesh Ramakrishnan

IIT Bombay

ganesh@cse.iitb.ac.in

Gurupreet Singh Lehal

IIIT Hyderabad

gs.lehal@research.iiit.ac.in

Ravi Kiran Sarvabhatla

IIIT Hyderabad

ravi.kiran@iiit.ac.in

Tushar Patnaik

CDAC Noida

tusharpatnaik@cdac.in

Venkatapathy Subramanian

IIT Bombay

venkatapathy@cse.iitb.ac.in