CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Data exploration, Playing styles, and Gameplay for Cooperative Partially Observable games: Pictionary as a case study


Kiruthika Kannan

Abstract

Cooperative human-human communication becomes challenging when restrictions such as difference in communication modality and limited time are imposed. In this thesis, we present the popular cooperative social game Pictionary as an online multimodal test bed to explore the dynamics of humanhuman interactions in such settings. Pictionary is a multiplayer game where the players attempt to convey a word or phrase through drawing. The restriction imposed on the mode of communication gives rise to intriguing diversity and creativity in the players’ responses. To explore the player activity in Pictionary, an online browser-based Pictionary application is developed and utilized to collect a Pictionary dataset. We conduct an exploratory analysis of the dataset, examining the data across three domains: global session-related statistics, target word-related statistics, and user-related statistics. We also present our interactive dashboard to visualize the analysis results. We identify attributes of player interactions that characterize cooperative gameplay. Using these attributes, we find stable role-specific playing style components independent of game difficulty. In terms of gameplay and the larger context of cooperative partially observable communication, our results suggest that too much interaction or unbalanced interaction negatively impacts game success. Additionally, the playing style components discovered via our analysis align with select player personality types proposed in existing frameworks for multiplayer games. Furthermore, this thesis explores atypical sketch content within the Pictionary dataset. We present various baseline models for detecting such atypical content. We conduct a comparative analysis of three baseline models, namely BiLSTM+CRF, SketchsegNet+, and modified CRAFT. Results indicate that the image segmentation-based deep neural network outperforms recurrent models that rely on stroke features or stroke coordinates as input.

Year of completion:  November 2023
 Advisor : Ravi Kiran Sarvadevabhatla

Related Publications


    Downloads

    thesis

    Security from uncertainty: Designing privacy-preserving verification methods using Noise


    Praguna Manvi

    Abstract

    Biometric authentication plays an increasingly prominent role in today’s products and services for verifying an individual’s identity. It is not only efficient but also practical, as it establishes a unique link to an individual through their physical and behavioral characteristics [43]. Unlike conventional authentication mechanisms like passwords or documents, biometric traits are inherent to each individual, eliminating the need to memorize additional information [64]. However, the security and privacy of biometric templates used in authentication remain primary concerns, as biometric data is strongly and irrevocably tied to an individual, as emphasized in the article [42]. In the context of remote authentication, Secure Multiparty Computation (SMC) offers a powerful solution. SMC enables two parties to interactively compute a function using their private inputs without disclosing any information except for the output itself [19]. This approach ensures that biometric template comparison is carried out in a privacy-preserving manner, enhancing both security and privacy in authentication services. In this thesis, we introduce a unique approach to iris, fingerprint, and face verification by incorporating ”noise” into the authentication process. In our work,“noise” refers to signals obtained from non-discriminatory or unreliable regions of biometric characteristics. Our extensive empirical evaluation reveals a correlation among noise features, and we leverage this correlation in a novel Secure Two-Party Computation (STPC) design. This STPC design operates on quantified uncertainty between noise features, providing informationtheoretic security. Our approach has low accuracy degradations, practical computational complexity, wide applicability making it suitable for practical real-time applications.

    Year of completion:  December 2023
     Advisor : Anoop M Namboodiri

    Related Publications


      Downloads

      thesis

      Towards Enhancing Semantic Segmentation in Resource Constrained Settings


      Ashutosh Mishra

      Abstract

      Understanding the semantics of the scene to automate the decision process for self-driving cars completely is becoming a crucial task to solve in computer vision. Due to the recent progress in the state of autonomous driving, added with a lot of semantic segmentation datasets for road scene understanding being proposed, semantic segmentation of road scenes has recently evolved to be an important problem to tackle. But training semantic segmentation models becomes a resource-intensive task since it requires multi-GPU training and therefore becomes the bottleneck to reproducing results for better understanding quickly. This thesis introduces challenges and provides solutions to reduce the training time of segmentation models by introducing two small-scale datasets. Additionally, the thesis explores the potential of employing neural architecture search and automatic pruning techniques to create efficient segmentation modules in resource-constrained settings. Chapter2 of the thesis introduces the problem of semantic segmentation and discusses some deep learning approaches to solve supervised semantic segmentation. We briefly discuss the different metrics used and also touch upon the statistics of various datasets that are available in the literature to train semantic segmentation models. Chapter 3 of the thesis explains the need of having a dataset based on the Indian road scenario. Most of the datasets in the literature are captured in Western settings having well-defined traffic participants, delineated boundaries, etc, which seldom mold in the Indian setting. We describe the annotation pipeline, along with the quality check framework used to annotate the dataset. Now, though the IDD dataset [121] caters to the Indian setting, this dataset is still quite resource intensive in terms of GPU computation. Hence, there is a need to have a small resolution, less label-sized dataset for rapid prototyping. We introduce our proposed datasets and provide a detailed set of experiments, and statistical comparisons with the existing datasets to substantiate our claim regarding the usefulness of the proposed solution. We also show through experiments that the models trained using our datasets can be deployed on low-resource hardware such as Raspberry Pi. At the end of this chapter, we also look into the significance of the proposed datasets in facilitating challenges at two prominent conferences: the International Conference on Computer Vision (ICCV) and the National Conference on Pattern Recognition, Image Processing, and Graphics (NCVPRIPG) in 2019. These challenges aimed to address semantic segmentation in resource-constrained settings, inviting innovative architectures capable of achieving decent accuracy on these proposed datasets. We also discuss the potential application of these datasets in teaching semantic segmentation through a course of notebooks introducing traditional as well as deep learning-based methods to perform segmentation. These notebooks are plug-and-play, where the first three notebooks can run on laptop CPU, while the fourth notebook requires GPU access.

      Year of completion:  January 2024
       Advisor : C V Jawahar,Girish Varma

      Related Publications


        Downloads

        thesis

        High-Quality 3D Fingerprint Generation: Merging Skin Optics, Machine Learning and 3D Reconstruction Techniques


        Apoorva Srivastava

        Abstract

        Fingerprints are a widely recognized and commonly used method of identification. Contact-based fingerprints, which involve pressing the finger against a surface to obtain images, are a popular method of capturing fingerprints. However, this process has several drawbacks, including skin deformation, unhygienic conditions, and high sensitivity to the moisture content of the finger. These factors can negatively impact the accuracy of the fingerprint. Moreover, fingerprints are three-dimensional anatomical structures, and two-dimensional fingerprints do not capture the depth information of the finger ridges. While 3D fingerprint capture is less sensitive to skin moisture levels and avoids skin deformation, it is limited in adoption due to the high cost and system complexity associated with it. The complexity and cost are mainly attributed to the use of multiple cameras, projectors, and sometimes synchronously moving mechanical parts. Photometric stereo offers a promising solution to build low-cost, simple sensors for high-quality 3D capture using only a single camera and a few LEDs. However, the method assumes that the surface being imaged is lambertian, which is not the case for human fingers. Existing 3D fingerprint scanners based on photometric stereo also assume that the finger is lambertian, resulting in poor reconstruction results. In this context, we introduce the Split and Knit algorithm (SnK), a 3D reconstruction pipeline based on Photometric Stereo for finger surfaces. The algorithm splits the reconstruction of the ridge-valley pattern and finger shape and combines them to obtain the 3D fingerprint reconstruction for the full finger with a single camera for the first time. To reconstruct the ridge-valley pattern, SnK introduces an efficient way of estimating the direct illumination component by using a trained U-Net without extra hardware, which reduces the non-Lambertian nature of the finger image and enables a higher-quality reconstruction of the entire finger surface. To obtain the finger shape using a single camera, the algorithm introduced two novel approaches, a) using IR illumination and b) using a mirror and parametric modeling for the finger shape. Finally, we combine the overall finger shape and the ridge-valley point cloud to obtain a 3D finger phalange. The high-quality 3D reconstruction results in better matching accuracy of the captured fingerprints. Splitting the ridge-valley pattern from the finger provides an implicit way to convert 3D fingerprint into 2D fingerprint, making the SnK algorithm compatible with the 2D fingerprint recognition systems. To apply the SnK algorithm to fingerprints, we designed a 3D printed photometric stereo-based setup that captures contactless finger images and obtains their 3D reconstructions

        Year of completion:  August 2023
         Advisor : Anoop M Namboodiri

        Related Publications


          Downloads

          thesis

          A Holistic Framework for Multimodal Ecosystem of Pictionary


          Nikhil Bansal

          Abstract

          In AI, the ability of intelligent agent to model human player in games such as Backgammon, Chess and Go has been an important metric in benchmarking progress. Fundamentally, the games mentioned above can be characterized as competitive and zero-sum. In contrast, games such as Pictionary and Dumb Charades falls into the category of ‘social’ games. Unlike competitive games, the emphasis is on cooperative and co-adaptive game-play in a relaxed setting. Such social games can form the basis for the next wave of game-driven progress in AI. Pictionary™ is a wonderful example of cooperative game play to achieve a shared goal in communication-restricted settings. This popular sketch-based guessing game, which we employ as a use case, provides an opportunity to analyze shared goal cooperative game play in restricted communication settings. To enable the study of Pictionary and to understand various aspects associated with the game play, we designed a software ecosystem for web-based online game of Pictionary dubbed PICTGUESS. To overcome several technological and logistic barriers, which the actual game presents, we implemented a simplified setting for PICTGUESS wherein a game consists of a time-limited episode involving two players - a Drawer and a Guesser. The Drawer is tasked with conveying a given target phrase to a counterpart Guesser by sketching on a whiteboard within that time limit. However, occasionally some players in Pictionary draw atypical sketch content. While such content is occasionally relevant in the game context, it sometimes represents a rule violation and impairs the game experience. To address such situations in a timely and scalable manner, we introduce DRAWMON, a novel distributed framework for automatic detection of atypical sketch content in concurrently occurring Pictionary game sessions. We build specialized online interfaces to annotate atypical sketch content, resulting in ATYPICT, the first ever atypical sketch content dataset. We use ATYPICT to train CANVASNET, a deep neural atypical content detection network. We utilize CANVASNET as a core component of DRAWMON. Our analysis of post deployment game session data indicates DRAWMON’s effectiveness for scalable monitoring and atypical sketch content detection. Beyond Pictionary, our contributions can also serve as a design guide for customized atypical content response systems involving shared and interactive whiteboards.

          Year of completion:  September 2023
           Advisor : Ravi Kiran Sarvadevabhatla

          Related Publications


            Downloads

            thesis

            More Articles …

            1. Deploying Multi Camera Multi Player Detection and Tracking Systems in Top View
            2. Towards building controllable Text to Speech systems
            3. Effective and Efficient Attribute-aware Open-set Face Verification
            4. Face Reenactment: Crafting Realistic Talking Heads for Enhanced Video Communication and Beyond
            • Start
            • Prev
            • 2
            • 3
            • 4
            • 5
            • 6
            • 7
            • 8
            • 9
            • 10
            • 11
            • Next
            • End
            1. You are here:  
            2. Home
            3. Research
            4. Thesis
            5. Thesis Students
            Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.