MDDIL’25 Workshop
India’s linguistic mosaic includes 22 official languages, many scripts, and myriad dialects, making digital inclusion a major challenge. Most online content remains English-centric, limiting access for much of the population. There is a growing demand to digitize printed materials such as manuscripts, government records, legal documents, textbooks, and archives into searchable, machine-readable formats. Optical Character Recognition (OCR) is central to digitization, but Indian languages present challenges like diverse writing systems, fonts, historical artifacts, and varied layouts. AI advances are promising, yet widespread adoption needs standardized APIs, tools, and datasets to achieve interoperability and scalability.
