CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Improving the Efficiency of Fingerprint Recognition Systems


Additya Popli

Abstract

Humans have used different characteristics to identify each-other since early times. This practice of identification based on person-specific features called biometric traits has developed over time to use more sophisticated techniques and characteristics like fingerprints, irises and gait in order to improve the identification performance. Fingerprints due to their distinctiveness, persistence over time and ease of capture, have become of the most widely use biometric traits for identification. However, with this ever increasing dependence on fingerprint biometrics, it is very important to ensure the safety of these recognition system against potential attackers. One of the most common and successful ways to circumvent these systems is through the use of fake or artificial fingers synthesized using commonly available materials like silicon and clay to match the real fingerprint of any particular person. Most fingerprint recognition systems employ a spoof detection module to filter out these fake fingerprints. While they seem to work well in general, it is a well-established fact that spoof detectors are not able to identify spoof fingerprints synthesized using ”unseen” or ”novel” spoof materials, i.e, the materials which were not available during the training phase of the detector. While it is possible to synthesize a few fingers using the various available materials, present-day spoof detectors require a large amount of samples for their training, which is practically not feasible due to the high cost and high complexity of fabrication of spoof fingers. In this thesis, we propose a method for creating artificial fingerprint images using only a very limited number of artificial fingers created from a specific material. We train a style-transfer network using available spoof fingerprint images which learns to extract material properties from the image, and then for each material, uses the limited set of spoof fingerprint images to generate a huge dataset of artificial fingerprint images without actually fabricating spoof fingers. These artificial fingerprint images can then be utilised by the spoof detector for training. Through our experiments, we show that the use of these artificially generated spoof images for training can improve the performance of existing spoof detectors over unseen spoof materials. Another major limitation of present-day recognition systems is their high resource requirements. Most fingerprint recognition systems use a spoof detector as a separate system either in series or in parallel with a fingerprint matcher leading to very high memory and time requirements during inference. To overcome this limitation, we explore the relationship between these two tasks in order to develop a common module capable of performing both spoof detection and matching. Our experiments show a high level of correlation between the features extracted for spoof detection and matching. We propose a new joint model which achieves similar fingerprint spoof detection and matching performance on various datasets as current state-of-the-art methods while using 50% less time and 40% less memory, thus providing a significant advantage for recognition systems deployed on resource-constrained devices like mobile phones.

Year of completion:  December 2021
 Advisor : Anoop M Namboodiri

Related Publications


    Downloads

    thesis

    Interactive Layout Parsing of Highly Unstructured Document Images


    Abhishek Trivedi

    Abstract

    Ancient historical handwritten documents were one of the earliest forms of written media and contributed to the most valuable cultural and natural heritage of many countries globally. They hold early written knowledge about subjects like science, medicine, Buddhist doctrines and astrology. India has the most extensive collection of manuscripts and studies have been conducted on their digitization for passing on their wealth of wisdom to future generations. Targeted annotation systems for automatic multi-region instance segmentation of their document images exist but with relatively inferior quality layout prediction compared to their human-annotated counterparts. Precise boundary annotations of image regions in historical document images are crucial for downstream applications like OCR, which rely on region-class semantics. Some document collections contain densely laid out, highly irregular, and overlapping multi-class region instances with a large range in aspect ratio. Addressing this, a web-based layout annotation and analytics system is proposed in this thesis. The system, called HInDoLA, features an intuitive annotation GUI, a graphical analytics dashboard, and interfaces with machine-learning-based intelligent modules on the backend. HInDoLA has successfully helped us create the first-ever large-scale dataset for layout parsing of Indic palm-leaf manuscripts named Indiscapes. Keeping the non-technical nature of domain experts in mind, the tool offers an interactive and relatively fast annotation process with the help of two modes, namely Fully Automatic mode and Semi-Supervised Intelligent Mode. We then discuss the semi-supervised approach superiority over fully automatic approaches for Historical document annotation. Fully automatic boundary estimation approaches tend to be data-intensive, cannot handle variable-sized images, and produce sub-optimal results for images mentioned above. BoundaryNet, a novel resizing-free approach for high-precision semi-automatic layout annotation, is another main contribution from this thesis. An attention-guided skip network first processes the variablesized user-selected region of interest. The network optimization is guided via Fast Marching distance maps to obtain a good quality initial-boundary estimate and an associated feature representation. These outputs are processed by a Residual Graph Convolution Network optimized using Hausdorff loss to get the final region boundary. A challenging image manuscript dataset demonstrates that BoundaryNet outperforms solid baselines and produces high-quality semantic region boundaries. Qualitatively, our approach generalizes across multiple document image datasets containing different script systems and layouts, all without additional fine-tuning.

    Year of completion:  May 2022
     Advisor : Ravi Kiran Sarvadevabhatla

    Related Publications


      Downloads

      thesis

      Pose Based Action Recognition: Novel Frontiers for Fully Supervised Learning and Language aided Generalised Zero-Shot Learning


      Pranay Gupta

      Abstract

      Action recognition is indispensable not only to the umbrella field of computer vision but in multitudes of allied fields such as video surveillance, human computer interaction, robotics and human robot interaction. Typically action recognition is performed over RGB videos, however, in recent years skeleton action recognition has also gained a lot of traction. Much of it is owed to the development of frugal motion capture systems, which enabled the curation of large scale skeleton action datasets. In this thesis, we focus on skeleton-based human action recognition. We begin our explorations with skeleton-action recognition in the wild by introducing Skeletics-152, a curated and 3-D pose-annotated subset of RGB videos sourced from Kinetics-700, a large-scale action dataset. We extend our study to include out-of-context actions by introducing Skeleton-Mimetics, a dataset derived from the recently introduced Mimetics dataset. We also introduce Metaphorics, a dataset with caption-style annotated YouTube videos of the popular social game Dumb Charades and interpretative dance performances. We benchmark state-of-the-art models on the NTU-120 dataset and provide multi-layered assessment of the results. The results from benchmarking the top performers of NTU-120 on the newly introduced datasets reveal the challenges and domain gap induced by actions in the wild. Overall, our work characterizes the strengths and limitations of existing approaches and datasets. Via the introduced datasets, our work enables new frontiers for human action recognition. When moving ahead from the traditional supervised recognition to the more challenging zero-shot recognition, the language component of the action name becomes important. Focusing on this, we introduce SynSE, a novel syntactically guided generative approach for Zero-Shot Learning (ZSL). Our end-to-end approach learns progressively refined generative embedding spaces constrained within and across the involved modalities (visual, language). The inter-modal constraints are defined between action sequence embedding and embeddings of Parts of Speech (PoS) tagged words in the corresponding action description. We deploy SynSE for the task of skeleton-based action sequence recognition. Our design choices enable SynSE to generalize compositionally, i.e., recognize sequences whose action descriptions contain words not encountered during training. We also extend our approach to the more challenging Generalized Zero-Shot Learning (GZSL) problem via a confidence-based gating mechanism. We are the first to present zero-shot skeleton action recognition results on the large scale NTU-60 and NTU-120 skeleton action datasets with multiple splits. Our results demonstrate SynSE’s state of the art performance in both ZSL and GZSL settings compared to strong baselines on the NTU-60 and NTU-120 datasets. 3-D virtual avatars are extensively used in gaming, educational animation and physical exercise coaching applications. The development of these visually interactive systems relies heavily of how humans perceive the actions performed by the 3-D avatars. To this end, we perform a short user-study to gain insights into the recognizability of human actions performed virtually. Our results reveal that actions performed by 3-D avatars are significantly easier to recognize as compared to those performed 3-D skeletons. Concrete actions, i.e actions which can only be performed using a fixed set of movements are recognized more quickly and accurately as compared to abstract actions. Overall, in this thesis we study various new-frontiers in skeleton action recognition by means of novel datasets and tasks. We drift from unimodal approaches to a multimodal setup by incorporating language in a unique syntactically aware fashion with hopes of utilizing similar ideas in more challenging problems like skeleton action generation.

      Year of completion:  May 2022
       Advisor : Ravi Kiran Sarvadevabhatla

      Related Publications


        Downloads

        thesis

        Exploring Data Driven Graphics for Interactivity


        Aakash KT

        Abstract

        The goal of rendering in computer graphics is the synthesis of photorealistic images from abstract virtual scene descriptions. Among the many rendering methods, path tracing along with physically accurate material modeling has emerged as the industry standard for photorealistic image synthesis. Path tracing is, however a computationally expensive operation. Moreover, physically accurate material modeling requires iterative parameter tuning and visualization using path tracing. This quickly becomes a bottleneck for the tuning of a large number of parameters found in most physically accurate material models, and artists end up spending a lot of time waiting for path tracing to converge. This dissertation proposes to leverage the learning power of neural networks to plausibly approximate path tracing for: (a) quick material visualization while editing, (b) out-of-the box material recovery using inverse rendering. The traditional workflow for tuning material parameters and visualization is typically followed by modifications to lighting for better understanding the behaviour of the material. We thus incorporate the ability to modify the environment lighting in our neural material visualization framework. This significantly aids in understanding the behaviour of the material thus aiding the process of parameter tuning, which we demonstrate with a user study. Most real-world materials are spatially varying, which is that their behaviour varies across the surface of the geometry. While previous works in this area dealt with uniform materials, we propose improvements for handling spatially varying materials. We design our neural network architecture to be light-weight, having 10× lesser trainable parameters than the state-of-the-art, while also producing better visualizations. We also build a tool to demonstrate real-time parameter tuning and visualization of materials. Finally, we extend the above work to handle general 3D scenes in specific lighting scenarios. This is done by training a small fully connected neural network on a simple scene in the local tangent frame. Training in the local tangent frame decouples geometric complexity from training which implies that the network generalizes to any geometry. We embed this network in a standard path tracer to evaluate direct illumination at each path vertex. We thus not only obtain noise free direct illumination but also out-of-the box gradients of the rendering process, which can be used for inverse rendering. We show that these gradients can be used for plausibly recovering the material of a target scene. We conclude by discussing various practical aspects of our method.

        Year of completion:  April 2022
         Advisor : P J Narayanan

        Related Publications


          Downloads

          thesis

          A study of Automatic Segmentation of 3-D Brain MRI and its application to Deformable Registration


          Aabhas Majumdar

          Abstract

          Automated segmentation of cortical and non-cortical human brain structures has been hitherto approached using nonrigid registration followed by label fusion. We propose an alternative approach for this using a convolutional neural network (CNN) which classifies a voxel into one of many structures. Four different kinds of two-dimensional and three-dimensional intensity patches are extracted for each voxel, providing local and global (context) information to the CNN. The proposed approach is evaluated on five different publicly available datasets which differ in the number of labels per volume. The obtained mean Dice coefficient varied according to the number of labels, for example, it is 0.844 0.031 and 0.743 0.019 for datasets with the least (32) and the most (134) number of labels, respectively. These figures are marginally better or on par with those obtained with the current state-of-the-art methods on nearly all datasets, at a reduced computational time. The consistently good performance of the proposed method across datasets and no requirement for registration make it attractive for many applications where reduced computational time is necessary. Feature-based registration has been popular with a variety of features ranging from voxel intensity to Self-Similarity Context (SSC). In the second part of the thesis, we examine the question of how features learnt using various Deep Learning (DL) frameworks for segmentation can be used for deformable registration and whether this feature learning is necessary or not. We investigate the use of features learned by different DL methods in the current state-of-the-art discrete registration framework and analyze its performance on 2 publicly available datasets. We draw insights about the type of DL framework useful for feature learning. We consider the impact, if any, of the complexity of different DL models and brain parcellation methods on the performance of discrete registration. Our results indicate that the registration performance with DL features and SSC are comparable and stable across datasets whereas this does not hold for low level features.

          Year of completion:  April 2022
           Advisor : Jayanthi Sivaswamy

          Related Publications


            Downloads

            thesis

            More Articles …

            1. Saliency Estimation in Videos and Images
            2. Fast and Accurate Image Recognition
            3. Towards Understanding and Improving the Generalization Performance of Neural Networks
            4. Exploiting Cross-Modal Redundancy for Audio-Visual Generation
            • Start
            • Prev
            • 8
            • 9
            • 10
            • 11
            • 12
            • 13
            • 14
            • 15
            • 16
            • 17
            • Next
            • End
            1. You are here:  
            2. Home
            3. Research
            4. Thesis
            5. Thesis Students
            Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.