CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Journals
    • Books
    • MS Thesis
    • PhD Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Past Announcements
  • Contact Us

Fingerprint Disentanglement for Presentation Attack Generalization Across Sensors and Materials


Gowri Lekshmy

Abstract

In today's digital era, biometric authentication has become increasingly widespread for verifying a user across a range of applications, from unlocking a smartphone to securing high-end systems. Various biometric modalities such as fingerprint, face, and iris offer a distinct way to recognize a person automatically. Fingerprints are one of the most prevalent biometric modalities. They are widely utilized in security systems owing to their remarkable reliability, distinctiveness, invariance over time and user convenience. Nowadays, automatic fingerprint recognition systems have become a prime target for attackers. Attackers fabricate fingerprints using materials like playdoh and gelatin, making it hard to distinguish them from live fingerprints. This way of circumventing biometric systems is called a presentation attack (PA). To identify such attacks, a PA detector is added to these systems. Deep learning-based PA detectors require large amounts of data to distinguish PA fingerprints from live ones. However, there exists significantly less training data with novel sensors and materials. Due to this, PA detectors do not generalize well on introducing unknown sensors or materials. It is incredibly challenging to physically fabricate an extensive train dataset of high-quality counterfeit fingerprints generated with novel materials captured across multiple sensors. Existing fingerprint presentation attack detection (FPAD) solutions improve cross-sensor and cross-material generalization by utilizing styletransfer-based augmentation wrappers over a two-class PA classifier. These solutions generate large artificial datasets for training by using style transfer which learns the style properties from a few samples obtained from the attacker. They synthesize data by learning the style as a single entity, containing both sensor and material characteristics. However, these strategies necessitate learning the entire style upon adding a new sensor for an already known material or vice versa. This thesis proposes a decomposition-based approach to improve cross-sensor and cross-material FPAD generalization. We model presentation attacks as a combination of two underlying components, i.e., material and sensor, rather than the entire style. By utilizing this approach, our method can generate synthetic patches upon introducing either a new sensor, a new material, or both. We perform two different methods of fingerprint factorization - traditional and deep-learning based. Traditional factorization of fingerprints into sensor and material representations using tensor decomposition establishes a baseline using machine learning for our hypothesis. The deep-learning method uses a decompositionbased augmentation wrapper for disentangling fingerprint style. The wrapper improves cross-sensor and cross-material FPAD, utilizing one fingerprint image of the target sensor and material. We also reduce vi vii computational complexity by generating compact representations and utilizing lesser combinations of sensors and materials to produce several styles. Our approach enables us to generate a large variety of samples using a limited amount of data, which helps improve generalization

Year of completion:  June 2023
 Advisor : Anoop M Namboodiri

Related Publications


    Downloads

    thesis

    Extending PRT Framework for Lowly-Tessellated and Continuous Surfaces


    Dhawal Sirikonda

    Abstract

    Precomputed Radiance Transfer (PRT) is widely used for real-time photorealistic effects. PRT dis entangles the rendering equation into transfer and lighting, enabling their precomputation. Transfer accounts for the cosine-weighted visibility of points in the scene, while Lighting is usually a distant emitted lighting, e.g., environment. Transfer computation involves tracing several rays into the scene from every point on the surface. For every ray, the binary visibility is calculated, and a spherical function is obtained. The spherical function is projected into Spherical Harmonic(SH) domain. SH is a band-limited representation of spherical functions, and the order of SH decides the representation capacity of the SH (the higher the SH order better the approximation of a spherical function). The SH domain also facilitates fast and efficient integral computation by simplifying the integral into simple dot products and convolutions. The original formulation of PRT by Sloan et al. 2002 provides different storage requirements for the transfer—vectors in the case of diffuse materials and matrices in the case of glossy materials. Using matrices for Transfer representation makes it infeasible as the SH orders increase. The work of Triple Product Formulation by Ng et al. in 2004 extended the formulation to allow simple vector-based Transfer storage even for the case of glossy materials. Prior art stored precomputed transfer in a tabulated manner in vertex space. These values are fetched with interpolation at each point for shading. Since the barycentric interpolation is finally employed to calculate the final color across the geometry apart from the vertex locations, the vertex space methods require densely tessellated mesh vertices to obtain accurate radiance. Sometimes high-density(tessellated) meshes adversely affect runtimes and memory requirements. This is mainly observed in simple geometries with no additional detailing but still demanding higher triangle counts (e.g., planes, walls, etc.). The first work provides a solution by leveraging Texture space, which is more continuous than the Vertex space. We also added additional functionality to obtain inter-reflection effects in the texture space. While Texture space methods provide faithful results in meshes, they require non-overlapping, areapreserving UV mapping, and a high-resolution texture to avoid artifacts. In the subsequent work, we propose a compact transfer representation that is learnt directly on scene geometry points. Specifically, we train a small multi-layer perceptron (MLP) to predict the transfer at sampled surface points. Our approach is most beneficial where inherent mesh storage structure and natural UV mapping are unavailable, such as Implicit Surfaces, as it learns the transfer values directly on the surface. Using our approach, we demonstrate real-time, photorealistic renderings of diffuse and glossy materials on SDF geometries with PRT.

    Year of completion:  June 2023
     Advisor : P J Narayanan

    Related Publications


      Downloads

      thesis

      Virtual World Creation


      Aryamaan Jain

      Abstract

      Synthesis, capture and analysis of a highly complex 3D terrain structure are essential for critical applications such as river/flood modelling, disaster mitigation planning, landslide modelling and flight simulation. On the other hand, synthesis of natural-looking 3D terrains finds its applications in the entertainment industry such as computer gaming and VFX. This thesis explores novel learning-based techniques for the generation of immersive and realistic 3D virtual environments, catering to the needs of the aforementioned applications. The generation of virtual worlds involves multiple components, including terrain, vegetation, and other objects. We primarily focus on three key aspects for virtual world generation: 1) develop novel AI-enabled 3D terrain authoring solutions based on real-world satellite and aerial data using a learning-based framework, 2) L-systems grammar-based 3D tree generation and 3) rendering techniques that are used to create high-quality visualizations of the generated world. Terrain generation is a critical component of 3D virtual world generation, as it provides the foundational structure for the environment. Traditional techniques for 3D terrain generation involve procedural generation, which relies on mathematical algorithms to generate landscapes. However, deep learning techniques have shown promise in generating more realistic terrain, as they can learn from real-world data to produce new, varied, and realistic landscapes. In this thesis, we explore the use of deep learning techniques for 3D terrain generation, which can produce realistic and varied terrains with high visual fidelity. Specifically, we propose two learning-based novel frameworks for Interactive 3D Terrain Authoring & Manipulation and Adaptive Multi-Resolution Infinite Terrain Generation. In addition to terrain generation, vegetation is another important component of virtual world generation. Trees and other plants provide visual interest and can help create a more immersive environment. L-systems are a popular technique for generating realistic vegetation, as they are capable of generating complex structures that resemble real-world plants. In this thesis, we propose a variant of the L-systems for 3D tree generation and compare the results to traditional procedural generation techniques. Finally, rendering is a critical component of 3D virtual world generation, as it is responsible for creating the final visual output that users will see. In addition to terrain and tree generation, this thesis also covers rendering techniques used to visualize the generated virtual world. We explore the use of real-time rendering techniques in conjunction with terrain generation to achieve high-quality visual results while maintaining performance. Overall, the research presented in this thesis aims to advance the state-of-the-art in virtual world generation and contribute to the development of more realistic and immersive virtual environments. We performed extensive empirical evaluation on publicly available datasets to report details qualitative and quantitative results demonstrating the superiority of the proposed methods over existing solutions in the literature.

      Year of completion:  July 2023
       Advisor : Avinash Sharma, Rajan Krishnan Sundara

      Related Publications


        Downloads

        thesis

        Driving into the Dataverse: Real and Synthetic Data for Autonomous Vehicles


        Shubham Dokania

        Abstract

        The rapid advancement of autonomous driving systems is transforming the future of transportation and urban mobility. However, these systems face significant challenges when deployed in complex and unstructured traffic environments, such as those found in many cities in south-east Asian countries like India. This thesis aims to address the challenges related to data collection, management, and generation for autonomous driving systems operating in such scenarios. The primary contributions of this work include the development of a data collection toolkit, the creation of a comprehensive driving dataset for unstructured environments (IDD-3D), and the proposal of a synthetic data generation framework (TRoVE) based on real-world data. The data collection toolkit presented in this thesis enables sensor fusion for driving scenarios by treating sensor APIs as separate entities from the data collection interface. This framework allows for the use of any sensor configuration and demonstrates the smooth creation of driving datasets using our framework. This toolkit is adaptable to different environments and can be easily scaled, making it a crucial step towards creating a large-scale dataset for Indian road scenarios. The IDD-3D dataset provides a valuable resource for studying unstructured driving scenarios with complex road situations. We present a thorough statistical and experimental analysis of the dataset, which includes high-quality annotations for 3D object bounding boxes and instance IDs for tracking. We highlight the diverse object types, categories, and complex trajectories found in Indian road scenes, enabling the development of robust autonomous driving systems that can generalize across different geographical locations. In addition, we provide benchmarks for 3D object detection and tracking using state-of-the-art approaches. The TRoVE synthetic data toolkit offers a framework for the automatic generation of synthetic data for visual perception, leveraging existing real-world data. By combining synthetic data with real data, we show the potential for improved performance in various computer vision tasks. The data generation process can be extended to different locations and scenarios, avoiding the limitations of bounded data volumes and variety found in manually designed virtual environments. This thesis contributes to the development of systems in complex environments by addressing the challenges of data acquisition, management, and generation. By bridging the gap between the current state-of-the-art and the needs of unstructured traffic scenarios, this work paves the way for more robust and versatile intelligent transportation systems that can operate safely and efficiently in a wide range of situations

        Year of completion:  July 2023
         Advisor : C V Jawahar

        Related Publications


          Downloads

          thesis

          Improved Representation Spaces for Videos


          Bipasha Sen

          Abstract

          Videos form an integral part of human lives and act as one of the most natural forms of perception spanning both the spatial and the temporal dimensions: the spatial dimension emphasizes the content, whereas the temporal dimension emphasizes the change. Naturally, studying this modality is an important area of computer vision. Notably, one must efficiently capture this high-dimensional modality to perform different downstream tasks robustly. In this thesis, we study representation learning for videos to perform two key aspects of video-based tasks: classification and generation. In a classification task, a video is compressed to a latent space that captures the key discriminative properties of a video relevant to the task. On the other hand, generation involves starting with a latent space (often a known space, such as standard normal) and learning a valid mapping between the latent and the video manifold. This thesis explores complementary representation techniques to develop robust representation spaces useful for diverse downstream tasks. In this vein, this thesis starts by tackling video classification, where we concentrate on a specific task of “lipreading” (transliterating videos to text) or in technical terms - classifying videos of mouth movements. Through this work, we propose a compressed generative space that self-augments the dataset improving the discriminative capabilities of the classifier. Motivated by the findings of this work, we move on to finding an improved generative space in which we touch upon several key elements of video generation, including unconditional video generation, video inversion, and video superresolution. In the classification task, we aim to study lipreading (or visually recognizing speech from the mouth movements of a speaker), a challenging and mentally taxing task for humans to perform. Unfortunately, multiple medical conditions force people to depend on this skill in their day-to-day lives for essential communication. Patients suffering from ‘Amyotrophic Lateral Sclerosis’ (ALS) often lose muscle control, consequently, their ability to generate speech and communicate via lip movements. Existing large datasets do not focus on medical patients or curate personalized vocabulary relevant to an individual. Collecting large-scale datasets of a patient needed to train modern data-hungry deep learning models is, however, extremely challenging. We propose a personalized network designed to lipread for an ALS patient using only one-shot examples. We depend on synthetically generated lip movements to augment the one-shot scenario. A Variational Encoder-based domain adaptation technique is used to bridge the real-synthetic domain gap. Our approach significantly improves and achieves high top-5 accuracy with 83.2% accuracy compared to 62.6% achieved by comparable methods for the patient. Apart from evaluating our approach on the ALS patient, we also extend it to people with hearing impairment, relying extensively on lip movements to communicate. In the next part of the thesis, we focus on representation spaces for video-based generative tasks. Generating videos is a complex task that is accomplished by generating a set of temporally coherent images frame-by-frame. This approach confines the expressivity of videos to image-based operations on individual frames, necessitating network designs that can achieve temporally coherent trajectories in the underlying image space. We propose INR-V, a video representation network that learns a continuous space for video-based generative tasks. INR-V parameterizes videos using implicit neural representations (INRs), a multi-layered perceptron that predicts an RGB value for each input pixel location of the video. The INR is predicted using a meta-network which is a hypernetwork trained on neural representations of multiple video instances. Later, the meta-network can be sampled to generate diverse novel videos enabling many downstream video-based generative tasks. Interestingly, we find that conditional regularization and progressive weight initialization play a crucial role in obtaining INR-V. The representation space learned by INR-V is more expressive than an image space showcasing many interesting properties not possible with the existing works. For instance, INR-V can smoothly interpolate intermediate videos between known video instances (such as intermediate identities, expressions, and poses in face videos). It can also in-paint missing portions in videos to recover temporally coherent full videos. We evaluate the space learned by INR-V on diverse generative tasks such as video interpolation, novel video generation, video inversion, and video inpainting against the existing baselines. INR-V significantly outperforms the baselines on several of these demonstrated tasks, clearly showcasing the potential of the proposed representation space. In summary, this thesis makes a significant contribution to the field of computer vision by exploring representation learning for videos. The proposed methods are thoroughly evaluated through extensive experimentation and analysis, which clearly demonstrate their advantages over existing works. These findings have the potential to advance a range of video-based applications, including personalized healthcare, entertainment, and communication. By developing robust representation spaces that improve video classification and generation, this work opens up new possibilities for more natural and effective ways of perceiving, understanding, and interacting with videos

          Year of completion:  August 2023
           Advisor : C V Jawahar, Vinay P Namboodiri

          Related Publications


            Downloads

            thesis

            More Articles …

            1. Open-Vocabulary Audio Keyword Spotting with Low Resource Language Adaptation
            2. On Designing Efficient Deep Neural Networks for Semantic Segmentation
            3. Towards Handwriting Recognition and Search in Indic & Latin Scripts
            4. Learnable HMD Facial De-occlusion for VR Applications in a Person-Specific Setting
            • Start
            • Prev
            • 6
            • 7
            • 8
            • 9
            • 10
            • 11
            • 12
            • 13
            • 14
            • 15
            • Next
            • End
            1. You are here:  
            2. Home
            3. Research
            4. MS Thesis
            5. Thesis Students
            Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.