Understanding Stories by Joint Analysis of Language and Vision

 

Abstract:

Humans spend a large amount of time listening, watching, and reading stories. We argue that the ability to model, analyze, and create new stories is a stepping stone towards strong AI. We thus work on teaching AI to understand stories in films and TV series. To obtain a holistic view of the story, we align videos with novel sources of text such as plot synopses and books. Plots contain a summary of the core story and allow to obtain a high-level overview. On the contrary, books provide rich details about characters, scenes and interactions allowing to ground visual information in corresponding textual descriptions. We also work on testing machine understanding of stories by asking it to answer questions. To this end, we create a large benchmark dataset of almost 15,000 questions from 400 movies and explore its characteristics with several baselines.

Bio:

MakarandTapaswireceived his undergraduate education from NITK Surathkal in Electronics and Communications Engineering. Thereafter he pursued an Erasmus Mundus Masters program in Information and Communication Technologies from UPC Barcelona and KIT Germany. He continued with the Computer Vision lab at Karlsruhe Institute of Technology in Germany and recently completed his PhD. He will be going to University of Toronto as a post-doctoral fellow starting in October.