This year, CVIT projects secured these highly competitive grants. The selected projects span cutting-edge domains such as artificial intelligence, accessibility technologies, language computing, and multimodal learning—showcasing the institute’s commitment to solving complex real-world challenges through deep research and innovation.

Among the awardees are initiatives focused on building inclusive communication technologies for India’s linguistically and socially diverse population.

 

Multimodal AI for Inclusive Speech in India


One of the winning projects, led by Vineet Gandhi, is titled “IndiG Multimodal: Advancing Multimodal Machine Learning for Indian Languages.”  The project aims to bring powerful speech accessibility technologies to Indian languages by combining advances in speech processing, computer vision, and machine learning.

Dr. Gandhi’s lab has already developed sophisticated speech technologies in English capable of :

  • Converting slurred speech caused by dysarthria into clear speech.
  • Transforming whispers into normal voice.
  • Generating speech from lip movements.
  • Producing speech from subtle throat vibrations.

“These technologies can enable an app for people who cannot speak or struggle to speak,” explains Dr. Gandhi. “They can communicate through it, and the other person will hear a cleaner version of their speech.”

While such foundational models exist for English, equivalent large-scale AI models are scarce for Indian languages such as Hindi, Telugu, and Tamil. The project seeks to close this gap by developing strong base models for Indian languages and extending multimodal capabilities that combine speech, lip movement, and other physiological signals.

The motivation behind the proposal, Dr. Gandhi says, was straightforward:

“To write a useful problem statement—one that targets a critical and largely overlooked space at the intersection of speech technology, accessibility, and linguistic diversity.”

Building on years of validated research and working prototypes in English, the team now aims to localize and scale these technologies for India’s multilingual ecosystem. The impact could be far-reaching—empowering individuals with speech impairments, enabling silent or assistive speech interfaces, and making advanced voice technologies accessible to millions across India.

As Dr. Gandhi notes, the project’s success stems from “the large body of work we have already been doing in the lab,” which now stands poised to expand into India’s diverse linguistic landscape.

 

Building AI-Powered Bridges for Indian Sign Language


Another award-winning project tackles an equally critical accessibility challenge: advancing technology for Indian Sign Language (ISL).

India is home to an estimated 63 million deaf individuals, yet only about 500 qualified sign language instructors serve the entire country. This enormous gap highlights the urgent need for scalable digital tools that support communication and learning in sign language.

The project titled “AI for Understanding Sign Languages” is led by Ashutosh Modi (PI, IIT Kanpur) with C. V. Jawahar from IIIT-Hyderabad as Co-Principal Investigator.

Unlike spoken languages, sign language is a complex spatio-temporal multimodal system involving coordinated hand gestures, facial expressions, eye gaze, and body posture. Despite its linguistic richness, Indian Sign Language remains severely under-resourced in terms of datasets and AI tools.

To address this, the research team proposes to build one of the largest datasets for ISL, targeting over one million aligned video, text, and audio samples. Data will be gathered through multiple channels including:

  • Studio recordings
  • YouTube video mining
  • A community-driven mobile application

“We propose to investigate methods that analyze the linguistic nuances present in Indian sign languages,” explains Prof. Jawahar. “We also plan to explore how technology can be integrated into the conversational aspects of sign languages and sign language synthesis.”

The project aims to move beyond rigid template-based systems toward intelligent AI models capable of interpreting and generating natural sign language. Planned innovations include:

  • Transformer-based models for sign-to-text translation
  • Generative models for producing sign language from speech or text
  • New evaluation metrics that capture the linguistic richness of sign language

At its core, the initiative emphasizes co-design with the deaf and hard-of-hearing (DHH) community. The researchers will collaborate closely with organizations such as the Indian Sign Language Research and Training Centre (ISLRTC) to ensure that the technologies reflect real-world communication needs.

Ultimately, these advances could power real-time mobile and web applications that enable two-way communication—sign-to-text and text/speech-to-sign—in everyday settings such as hospitals, banks, schools, and railway stations.

By releasing datasets, models, and benchmarks openly, the team hopes to accelerate both research and social impact. Their vision is to help build an inclusive digital ecosystem where ISL users can access education, public services, and employment opportunities on equal terms.