Index - MDDIL’25

Workshop on Multilingual Document Digitization for Indian Languages: Standards, Tools, and Trends

In Conjunction with 11th International Conference on Pattern Recognition and Machine Intelligence (PReMI'25)

December 11 , 2025

IIT Delhi

Get Started

MDDIL’25 Workshop

India’s linguistic mosaic includes 22 official languages, many scripts, and myriad dialects, making digital inclusion a major challenge. Most online content remains English-centric, limiting access for much of the population. There is a growing demand to digitize printed materials such as manuscripts, government records, legal documents, textbooks, and archives into searchable, machine-readable formats. Optical Character Recognition (OCR) is central to digitization, but Indian languages present challenges like diverse writing systems, fonts, historical artifacts, and varied layouts. AI advances are promising, yet widespread adoption needs standardized APIs, tools, and datasets to achieve interoperability and scalability.

Key Topics

Explore the in OCR & Document AI for Indian languages with the OCR Consortium.
Define API standards and engineering best practices for seamless interoperability.
Tackle challenges in text representation, layout analysis, and dataset curation.
Discover how AI drives digitization of classical manuscripts.
Engage with experts and policymakers shaping the future of digital inclusion.
Chart actionable pathways for research, policy, and large-scale deployment.

About Us

We are a consortium of six leading institutions—IIIT Hyderabad, IIT Delhi, IIT Bombay, IIT Jodhpur, Punjabi University, and CDAC Noida—working under the National Language Translation Mission (Bhashini) funded by MeitY. Our team is developing advanced OCR technology for Indian languages across three modalities: printed, handwritten, and scene text. Together, we aim to make digital content accessible in all Indian languages through research, innovation, and national collaboration . For more details visit : ilocr.iiit.ac.in

Speakers

Plenary Session Speakers

Shri Amitabh Nag

CEO, DIBD

Smt. Asha Singh

MeitY , HCC(TDIL) Division of Ministry of Electronics and Information Technology

Shri. Vijay Kumar

MeitY , Govt of India

Invited Speakers

Lt.Col. Anant Sinha

Administrator,
The Asiatic Society , Kolkata

Prof. M Balakrishnan

Distinguished Visiting Professor at Plaksha University, Honorary Professor at CSE IIT Delhi, Founder ASSISTECH, SIT IIT Delhi

Prof. Ramesh C. Gaur

Professor & Dean,
Indira Gandhi National Centre for the Arts

Prof. Santanu Chaudhury

IIT Delhi

Industry Session Speakers

Atul Rai

CEO and Co-Founder
Staqu Technologies

Bhaskar Arun

Lead Data Scientist
Tata 1mg

Nitin Singh

Head of Rekhta Labs
R&D Wing Rektha Foundation

Program Schedule

Time	Session	Speaker
11:00 - 12:00	Consortium Presentations	Dr. Ankur Rana ( PU) Krishna Tulsyan ( IITH) Anike De ( IITJ) Bhupendra Kumar ( CDAC Noida) Dr. Venkatapathy. S (IITB)
12:00 - 13:00	Industry Talk
12:00-12:20	Digitization at scale: Ground Realities & Open Problems	Nitin Singh
12:20-12:40	Jarvis One: A Multimodal, Knowledge-Augmented AI System for Large-Scale Crime Analysis	Atul Rai
12:40-13:00	FocusFormer to FocusVLM: Building Specialized, Efficient, and Privacy-Preserving OCR Systems for Real-World Healthcare Documents	Bhaskar Arun
13:00-14:00	Lunch Break
14:00- 14:40	Plenary Sessions
14:00-14:40	Plenary Session	Sri Amitabh Nag
14:40-15:00	Standards & Interoperability in Indic OCR: APIs, Tools, Best Practices	Prof. Ravi Kiran S Krishna Tulsyan Venakatapathy S
15:00-16:20	Invited Talks
15:00-15:30	Preserving India’s Manuscript Heritage in the Age of AI: Challenges, Standards, and Pathways for Multilingual Digitization	Prof. Ramesh C.Gaur
15:30-15:50	Manuscript AI	Prof. Santanu Chaudhury
15:50-16:10	Heritage Meets Technology	Lt. Col. Anant Sinha
16:10-16:30	Challenges in accessing STEM documents by the visually impaired	Prof. M. Balakrishnan
16:30-16:40	Closing Remarks	Prof. C.V.Jawahar