Learning Emotions and Mental States in Movie Scenes

Dhruv Srivastava

Abstract

In this thesis, we delve into the analysis of movie narratives, with a specific focus on understanding the emotions and mental states of characters within a scene. Our approach involves predicting a diverse range of emotions for individual movie scenes and each character within those scenes. To achieve this, we introduce EmoTx, a novel multimodal Transformer-based architecture that integrates video data, multiple characters, and dialogues for making comprehensive predictions. Leveraging annotations from the MovieGraphs dataset, our model is tailored to predict both classic emotions (e.g., happiness, anger) and nuanced mental states (e.g., honesty, helpfulness). Our experiments concentrate on evaluating performance across the ten most common and twenty-five most common emotional labels, along with a mapping that clusters 181 labels into 26 categories. Through systematic ablation studies and a comparative analysis against established emotion recognition methods, we demonstrate the effectiveness of EmoTx in capturing the intricacies of emotional and mental states in movie contexts. Additionally, our investigation into EmoTx’s attention mechanisms provides valuable insights. We observe that when characters express strong emotions, EmoTx focuses on character-related elements, while for other mental states, it relies more on video and dialogue cues. This nuanced understanding enhances the interpretability and contextual relevance of EmoTx in the domain of movie story analysis. The findings presented in this thesis contribute to advancing our comprehension of character emotions and mental states in cinematic narratives.

Year of completion:	April 2024
Advisor :	Makarand Tapaswi