Neural and Multilingual Approaches to Machine Translation for IndianLanguages and its Applications

Jerin Philip

Abstract

Neural Machine Translation (NMT), together with multilingual formulations have arisen as the de-facto standard in translating a sentence from a source language to a target language. However, unlike many western languages, the available resources like training data of parallel sentences ortrained models which can be used to build and demonstrate applications in other domains are limited for the languagesin the Indian subcontinent. This work takes a major step towards closing this gap.In this work, we describe the development of state-of-the art translation solutions for 10 Indian lan-guages and English. We do this in four parts described below:1.Considering the Hindi-English language pair we successfully develop an NMT solution for anarrow-domain, demonstrating its application in translating cricket commentary.2.Through heavy data augmentation, we extend the above to the general domain and build a state-of-the art MT system for Hindi-English language pair. Further, We extend to five more languagesby taking advantage of multiway formulations.3.WedemonstratetheapplicationoftheNMTincontributingmoreresourcestothealreadyresource-scarce field, expanding to 11 langauges and its application in a multimodal task of translating a talking face to a target language with lip synchronization.4.Next, we improve both data-situation and performance for machine translation in 11 Indian Lan-guages iteratively to place our models in a standardized, comparable set of metrics setting up forfuture advances in the space to comprehensively evaluate and compare against.

Year of completion:	August 2020
Advisor :	C V Jawahar, Vinay P. Namboodiri