Prof. Ravi Kiran Sarvadevabhatla, who heads the computer vision activities of the BharatGen initiative at IIIT-Hyderabad, enlightens us with one of the first successful uses of vision language models in India — e-commerce. "Normally, we are consumers in the online world — choosing a product, putting it in the cart, and clicking 'Buy'. But it's a very different experience to be a seller on the same website," he says.
For small-time or new-time entrepreneurs, or even sellers, putting up a product on the web is a complex and time-consuming affair. Apart from the initial registration, sellers have to upload two pictures of their product, and fill out long forms giving its details, specifications, and categories. "There is a form that needs to be filled in; it is a lot of writing and can be daunting for people who don't speak English," Prof. Ravi Kiran explains.
In order to overcome this issue, the BharatGen team created a vision-language model that can facilitate product listing automation.
The model, starting from image uploads, creates product descriptions and metadata in context, thereby minimizing the labour required for manual entry particularly for low-English-proficiency sellers. Not only does this innovation simplify onboarding, but it also increases e-commerce availability to more sellers that are diverse and scattered throughout India.
“Our technology might be generating the product description automatically but it is important to communicate this content to the sellers in an Indic language of their choice so that they know exactly how their product is being described”, explains Prof. Ravi Kiran. It is this accessibility in various languages that is the aim of BharatGen.
The project is being executed by the TIH Foundation for IoT and IOE at IIT Bombay, in collaboration with several leading academic institutions, including IIT Bombay, IIIT Hyderabad, IIT Mandi, IIT Kanpur, IIIT Hyderabad, IIM Indore, and IIT Madras.
Union Minister Dr. Jitendra Singh, during its virtual launch, called BharatGen “a proud example of India’s commitment to homegrown technologies.”
The e-vikrAI use case was selected as an exhibit at the prestigious Indian Mobile Congress(IMC) 2024 event. The technology attracted a lot of attention and interest from the visitors, which included prominent Government officials and tech entrepreneurs.
BharatGen will capture the diversity and cultural heritage of our country through the Bharat Datasagar. This project will not only symbolise technological advancement but will also become a means of bringing Indian local knowledge and experiences into the mainstream of AI.
“Large, pre-trained multimodal models can be a game changer in improving the productivity and ease of usage in several situations They can also enhance the access to a lot of services to those who are not proficient in English. That is what e-VikrAI tries to do. This is just a beginning and the tools developed by the BharatGen effort will bring advanced AI technology to practically every Indian in the future,” said Prof P J Narayanan, Director of IIITH.
By supporting both text and speech, BharatGen will cover the vast linguistic diversity of India. Its multilingual datasets will capture the intricacies of Indian languages, which are often overlooked in global AI models. The project, expected to be completed by 2026, will benefit government, private, and academic institutions, fostering AI research and innovation in the country.