CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Quality Beyond Perception: Introducing Image Quality Metrics for Enhanced Facial and Fingerprint Recognition


Prateek Jaiswal

Abstract

Assessing the quality of biometric images is key to making recognition technologies more accurate and reliable. Our research began with fingerprint recognition systems and later expanded to facial recognition systems, underscoring the importance of image quality in both areas. For fingerprint recognition, image quality is vital for accuracy. We developed the Fingerprint RecognitionBased Quality (FRBQ) metric, which improves on the limitations of the NFIQ2 model. FRBQ leverages deep learning algorithms in a weakly supervised setting, using matching scores from DeepPrint, a FixedLength Fingerprint Representation Model. Each score is labeled to reflect the robustness of fingerprint image matches, providing a comprehensive metric that captures diverse perspectives on image quality. Comparative analysis with NFIQ2 reveals that FRBQ correlates more strongly with recognition scores and performs better in evaluating challenging fingerprint images. Tested with the FVC 2004 dataset, FRBQ has proven effective in assessing fingerprint image quality. After our success with fingerprint recognition, we turned to facial recognition systems. In facial recognition, image quality involves more than just perceptual aspects; it includes features that convey identity information. Existing datasets consider factors like illumination and pose, which enhance robustness and performance. However, age variations and emotional expressions can still pose challenges. To tackle these, we introduced the Unified Tri-Feature Quality Metric (U3FQ). This framework combines age variance, facial expression similarity, and congruence scores from advanced recognition models like VGG-Face, ArcFace, FaceNet, and OpenFace. U3FQ uses a Regression Network model specifically designed for facial image quality assessment. We compared U3FQ to general image quality assessment techniques like BRISQUE, BLINDS-II, and RankIQA, as well as specialized facial image quality methodologies like PFE, SER-FIQA, and SDD-FIQA. Our results, supported by analyses such as DET plots, expression match heat maps, and EVRC curves, show U3FQ’s effectiveness. Our study highlights the transformative potential of artificial intelligence in biometrics, capturing critical details that traditional methods might miss. By providing precise quality assessments, we emphasize its role in advancing both fingerprint and facial recognition systems. This work sets the stage for further research and innovation in biometric analysis, underlining the importance of image quality in improving recognition technologies.

Year of completion:  July 2024
 Advisor : Anoop M Namboodiri

Related Publications


    Downloads

    thesis

    Towards Label Free Few Shot Learning : How Far Can We Go?


    Aditya Bharti

    Abstract

    Deep learning frameworks have consistently pushed the state-of-the-art limit across various problem domains such as computer vision, and natural language processing applications. Such performance improvements have only been made possible by the increasing availability of labeled data and computational resources, which makes applying such systems to low data regimes extremely challenging. Computationally simple systems which are effective with limited data are essential to the continued proliferation of DNNs to more problem spaces. In addition, generalizing from limited data is a crucial step toward more human-like machine intelligence. Reducing the label requirement is an active and worthwhile area of research since getting large amounts of high-quality annotated data is labor intensive and often impossible, depending on the domain. There are various approaches to this: artificial generation of extra labeled data, using existing information (other than labels) as supervisory signals for training, and designing pipeline that specifically learn using only a few samples. We focus our efforts on that last class of channels which aims to learn from limited labeled data, also known as Few Shot Learning. Few-Shot learning systems aim to generalize to novel classes given very few novel examples, usually one to five. Conventional few-shot pipelines use labeled data from the training set to guide training, then aim to generalize to the novel classes which have limited samples. However, such approaches only shift the label requirement from the novel to the training dataset. In low data regimes, where there is a dearth of labeled data, it may not be possible to get enough training samples. Our work aims to alleviate this label requirement by using no labels during training. We examine how much performance is achievable using extremely simple pipelines overall. Our contributions are hence twofold. (i) We present a more challenging label-free few-shot learning setup and examine how much performance can be squeezed out of a system without labels. (ii) We propose a computationally and conceptually simple pipeline to tackle this setting. We tackle both the compute and data requirements by leveraging self-supervision for training and image similarity for testing.

    Year of completion:  January 2024
     Advisor : C V Jawahar,Vineeth Balasubramanian

    Related Publications


      Downloads

      thesis

       

      A Deep Learning Paradigm for Fingerprint Recognition: Harnessing U-Net Architecture for Fingerprint Enhancement and Representation Learning


      Ekta Gavas

      Abstract

      Biometric technology has long relied on fingerprint recognition as a trusted means of identity verification. While fingerprint enhancement shares similarities with general image denoising tasks, the unique properties of biometric data make it crucial to treat fingerprints differently from typical realworld images. Deep learning methods have the capacity to capture and improve complex patterns and subtle details in fingerprint data, offering a comprehensive solution. This thesis proposes a deep learning-based method, specifically leveraging the U-Net architecture to achieve superior fingerprint enhancement. At its core, the approach employs a multi-task deep network with domain knowledge from minutia and orientation fields for the enhancement of rolled and plain fingerprint images. We investigate different configurations of the U-Net architecture, examine the effects of various architectural elements, and present extensive experimental evidence to support the effectiveness of our proposed approach. Subsequently, we explore a self-supervised learning paradigm with fingerprint biometrics for robust feature representation learning. In this direction, we introduce a pre-training technique by utilizing the learned information from the enhancement task. We adopt the enhancement pre-trained encoder for learning fixed-length fingerprint embeddings. We evaluate the performance of the learned embeddings in the verification task compared to standard self-supervised techniques without the explicit need for enhanced fingerprints. By merging the capabilities of deep learning with the specificity of fingerprint features, the proposed paradigm offers significant improvements in the robustness and accuracy of fingerprint verification. Experimental evaluations on standard benchmarks such as the NIST and FVC datasets demonstrate the efficacy of this approach. We present compelling evidence that our methodology not only improves the quality of fingerprint images but also facilitates a more effective embedding extraction, ultimately leading to enhanced recognition performance. This work highlights the promising role that deep learning, tailored to biometrics, can play in advancing the field of fingerprint recognition. In addition, it opens up avenues for future research in biometrics, particularly in optimizing computational efficiency and in tackling challenges related to partial and distorted fingerprints.

      Year of completion:  February 2024
       Advisor : Anoop M Namboodiri

      Related Publications


        Downloads

        thesis

         

        Physical Adversarial Attacks on Face Presentation Attack Detection Systems


        Sai Amrit Patnaik

        Abstract

        In the realm of biometric security, face recognition technology plays an increasingly pivotal role. However, as its adoption grows, so does the need to safeguard it against adversarial attacks. Attacks involve presenting images of a person printed on a medium or displayed on a screen. Detection of such attacks relies on identifying artifacts introduced in the image during the printing or display and capture process. Adversarial Attacks try to deceive the learning strategy of a recognition system using slight modifications to the captured image. Evaluating the risk level of adversarial images is essential for safely deploying face authentication models in the real world. Among these, physical adversarial attacks present a particularly insidious threat to face antispoofing systems. Popular approaches for physical-world attacks, such as print or replay attacks, suffer from some limitations, like including physical and geometrical artifacts. The presence of a physical process (printing and capture) between the image generation and the PAD module makes traditional adversarial attacks non-viable. Recently adversarial attacks have gained attraction, which try to digitally deceive the learning strategy of a recognition system using slight modifications to the captured image. While most previous research assumes that the adversarial image could be digitally fed into the authentication systems, this is not always the case for systems deployed in the real world. This thesis delves into the intriguing domain of physical adversarial attacks on face antispoofing systems, aiming to expose their vulnerabilities and implications. Our research unveils novel methodologies using white box and black box approaches to craft adversarial inputs capable of deceiving even the most robust face antispoofing systems. Unlike traditional adversarial attacks that manipulate digital inputs, our approach operates in the physical domain, where printed images and replayed videos are utilized to mimic real-world presentation attacks. By dissecting and understanding the vulnerabilities inherent in face antispoofing systems, we can develop more resilient defenses, contributing to the security of biometric authentication in an increasingly interconnected world. This thesis not only highlights the pressing need to address these vulnerabilities but also motivates towards a pioneering approach by exploring simple yet effective attack strategy to advancing the state of the art in face antispoofing security.

        Year of completion:  February 2024
         Advisor : Anoop M Namboodiri

        Related Publications


          Downloads

          thesis

          Text-based Video Question Answering


          Soumya Shamarao Jahagirdar

          Abstract

          Think of a situation where you put yourself in the shoes of a visually impaired person who wants to buy an item from a store or a person who is sitting in their house and watching the news on the television and wants to know about the content of the news being broadcast. Motivated by many more such situations where creating systems capable of understanding and reasoning over textual content in the videos, in this thesis, we tackle the novel problem of text-based video question answering. Vision and Language are broadly regarded as cornerstones of intelligence. Though each of these has different aims – language has the purpose of communication, and transmission of information, and vision has the purpose of constructing mental representations of the scene around us to navigate and interact with objects. When we study both of these fields jointly, it can result in applications, tasks, and methods that, when combined go beyond the scope compared to when they are used individually. This inter-dependency is being studied as a newly emerging area of a study named “multi-modal understanding”. Many tasks such as image captioning, visual question answering, video question answering, text-video retrieval, and more fall under the category of multi-modal understanding and reasoning tasks. To have a system that can reason over both text-based information and temporal-based information, we propose a new task. The first portion of this thesis focuses on the formulation of the text-based VideoQA task, by first analyzing the current datasets and works and thereby arriving at the need for text-based VideoQA. To this end, we propose the NewsVideoQA dataset where the question-answer pairs are framed on the text present in the news videos. As this is a new task proposed, we experiment with existing methods such as text-only models, single-image scene text-based models, and video question-answering models. As these baseline methods were not originally designed for the task of video question-answering using text in the videos, the need for a video question-answering model that can take the text in the videos into account to obtain answers became the need. To this end, we repurpose the existing VideoQA model to incorporate OCR tokens namely – OCR-aware SINGULARITY, a video question-answering framework that learns joint representations of videos and OCR tokens at the pretraining stage and also uses the OCR tokens at the finetuning stage. In this second portion of the thesis, we look into the M4-ViteVQA dataset which aims to solve the same task of text-based video question-answering but the videos belong to multiple categories such as shopping, traveling, vlogging, gaming, and so on. We perform a data exploratory analysis where we analyze both NewsVideoQA and M4-ViteVQA on several aspects that look for limitations in these datasets. Through the data exploratory experiment, we show that most of the questions in both datasets have questions that can be answered just by reading the text present in the videos. We also observe that most of the questions can be answered using a single to few frames in the videos. We perform an exhaustive analysis on a text-only model: BERT-QA which obtains comparable results to the multimodal methods. We also perform cross-domain experiments to check if training followed by finetuning on two different categories of videos helps the target dataset. In the end, we also provide some insights into creating a dataset and how certain types of annotations can help the community come up with better datasets in the future. We hope this work motivates future research on text-based video question-answering in multiple video categories. Furthermore, the pretraining strategies and combined representation learning from these videos and the multiple modalities that videos provide us will help create scalable systems and drive future research towards better datasets and creative solutions.

          Year of completion:  March 2024
           Advisor : C V Jawahar

          Related Publications


            Downloads

            thesis

            More Articles …

            1. Analytic and Neural Approaches for Complex Light Transport
            2. Learning Emotions and Mental States in Movie Scenes
            3. Revolutionizing TV Show Experience: Using Recaps for Multimodal Story Summarization
            4. Revisiting Synthetic Face Generation for Multimedia Applications
            • Start
            • Prev
            • 1
            • 2
            • 3
            • 4
            • 5
            • 6
            • 7
            • 8
            • 9
            • 10
            • Next
            • End
            1. You are here:  
            2. Home
            3. Research
            4. Thesis
            5. Thesis Students
            Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.