Effective and Efficient Attribute-aware Open-set Face Verification
Arun Kumar Subramanian
While face recognition and verification in controlled settings is already a solved problem for machines, the uniqueness of face as a biometric is that the mode of capture is highly diverse. A face could be captured nearby or at distance, at different poses, with different lighting, and by different devices. Face recognition/verification has several challenges to overcome to effectively perform under these varying conditions. Most current methods, try to find salient features of an individual by ignoring these variations. This can be looked at from the paradigm of signal and noise. The signal here refers to that information that is unique to an individual, but not varying as per the condition. Noise represents those aspects that are not related to the identity itself and are influenced by the capture mechanism, physical setting, etc. This is usually done through metric learning approaches in addition to the use of loss functions such as cross-entropy (e.g., Siamese networks, angular loss, and other margin losses such as ArcFace). There are certain aspects that lie between signal and noise such as facial attributes (such as eyeglasses). These may or may not be unique to the individual subject, but introduces artifacts into the face image. The question then arises, why can’t these variations be detected using learning methods, and the knowledge thus attained about the variations be put to good use during the matching process? It is this curiosity that has resulted in aggregation strategies for matching, which were previously implemented for aspects such as pose, age, etc. However, in the wild, humans demonstrate significant variability in facial attributes such as facial hair, eyeglasses, hairstyles, and make-up. This is common as one of the primary mechanisms of face image acquisition is covert capture in public (with ethics of consent in place), where people usually display significant variability in facial attributes. Hence it is very important to address this variability during the matching process. This work attempts to do the same. The curious question that arises however is if indeed matching performance varies if the attribute prior is known. Even if it does, how does one conceptualize a system that exploits the same? It is here that this thesis proposed two frameworks. One of the configuration-specific operating points and the other involves suppression of attribute information in face embedding prior to matching. The attribute suppression is attempted both directly at the final embedding, and suppression of intermediary layers of a Vision Transformer Deep Neural network. Both of these require the facial attribute of each image to be detected prior to passing the images into the proposed framework for matching. The above naturally adds another task to the face verification pipeline. It is therefore extremely necessary to find efficient and effective ways of performing face attribute detection (and face template generation), since efficiently performing parts, mitigates the pipeline expansion overhead and makes this a viable pipeline to consider for face verification. We observe that face attribute detection usually employs end-to-end networks, which results in a lot of parameters for inference. A feasible alternative is to constantly leverage the SOTA (state-of-the-art) face recognition networks and use the earlier feature layers to perform the face attribute classification task. Since the highly accurate SOTA is currently DNNs (Deep Neural Networks) for face, the same is dealt with in this thesis. More narrowly, we focus on open-set face verification, where DNNs aim to find unique representation even for subjects not used for training the DNN.
|Year of completion:
|Anoop M Namboodiri