Workshop Focus
Digitizing documents across Indian languages presents distinct technical and operational challenges due to script diversity, complex layouts, degraded archival material, and the prevalence of handwritten text. MDDIL’25 addressed these challenges by emphasizing the need for robust standards, interoperable systems, high-quality datasets, and deployment-ready solutions that can operate at scale. The discussions highlighted the importance of inclusive digital access to printed, handwritten, archival, and born-digital content across domains such as education, governance, healthcare, and cultural heritage.
Technical Themes
The workshop covered a broad range of topics, including:
- The workshop covered a broad range of topics, including:
- Advances in OCR and Document AI tailored to Indian scripts
- Engineering standards and APIs for interoperable OCR systems
- Layout analysis, script variability, and dataset curation
- AI-based digitization of manuscripts and historical documents
- Policy perspectives and pathways for large-scale adoption
Speakers and Contributions


MDDIL’25 featured invited talks from experts representing academia, industry, and government, including Shri Amitabh Nag, Smt. Asha Singh, Shri Vijay Kumar, Prof. M. Balakrishnan, Prof. Ramesh C. Gaur, Prof. Santanu Chaudhury, Lt. Col. Anant Sinha, Atul Rai, Bhaskar Arun, and Nitin Singh.,
The presentations covered topics such as privacy-aware OCR systems for healthcare, multimodal AI platforms for legal and crime analysis, digitization of large literary archives, accessibility solutions for visually impaired users, and AI-driven preservation of India’s manuscript heritage.
Program Highlights
The workshop program included plenary talks, invited lectures, and consortium presentations from IIIT Hyderabad, IIT Delhi, IIT Bombay, IIT Jodhpur, Punjabi University, and CDAC Noida. A dedicated panel session examined the need for standards and interoperability in Indic OCR, emphasizing collaboration between research institutions, industry, and policymakers. The workshop concluded with closing remarks by Prof. C. V. Jawahar.
Discussion and Engagement
An interactive question-and-answer session enabled participants to engage directly with the speakers, leading to discussions on low-resource language challenges, deployment considerations, and emerging research opportunities. The exchange encouraged cross-institutional collaboration and knowledge sharing.
Organization and Acknowledgements
MDDIL’25 was organized by a consortium of six premier institutions under the National Language Translation Mission (Bhashini), supported by the Ministry of Electronics and Information Technology (MeitY). The organizers acknowledge the support of Prof. Chetan Arora and his team—Deepa, Kahaan, and Achyuth—for their contributions to the successful execution of the workshop.
Concluding Remarks
By bringing together stakeholders from research, policy, and practice, MDDIL’25 reinforced the importance of coordinated efforts in advancing multilingual document digitization. The workshop outlined key priorities for the development of scalable, inclusive, and interoperable OCR systems, contributing to India’s broader digital and linguistic inclusion goals.

