ETL: Efficient Transfer Learning for Face Tasks


Thrupthi Ann John[1], Isha Dua[1], Vineeth N Balasubramanian[2] and C.V. Jawahar[1]

IIIT Hyderabad[1] IIT Hyderabad[2]

[ Video ]   | [ PDF ]

CMS main

Pipeline for efficient transfer of parameters from model trained on primary task like face-recognition to model for secondary task including gender, emotion, head pose and age in one pass. The ETL technique identifies and preserves the task related filters only which in turn results in highly sparse network for efficient training of face related tasks.

 

Abstract

Transfer learning is a popular method for obtaining deep trained models for data-scarce face tasks such as head pose and emotion. However, current transfer learning methods are inefficient and time-consuming as they do not fully account for the relationships between related tasks. Moreover, the transferred model is large and computationally expensive. As an alternative, we propose ETL: a technique that efficiently transfers a pre-trained model to a new task by retaining only \emph{cross-task aware filters}, resulting in a sparse transferred model. We demonstrate the effectiveness of ETL by transferring VGGFace, a popular face recognition model to four diverse face tasks. Our experiments show that we attain a size reduction up to 97\% and an inference time reduction up to 94\% while retaining 99.5\% of the baseline transfer learning accuracy.

Demo


Related Publications

ETL: Efficient Transfer Learning for Face tasks

Thrupthi Ann John, Isha Dua, Vineeth N Balasubramanian and C. V. Jawahar
ETL: Efficient Transfer Learning for Face Tasks , 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2022.  [ PDF ] , [ BibTeX ]

Contact

For any queries about the work, please contact the authors below

  1. Thrupthi Ann John - thrupthi [dot] ann [at] research [dot] iiit [dot] ac [dot] in
  2. Isha Dua: duaisha1994 [at] gmail [dot] com

Canonical Saliency Maps: Decoding Deep Face Models


Thrupthi Ann John[1], Vineeth N Balasubramanian[2] and C.V. Jawahar[1]

IIIT Hyderabad[1] IIT Hyderabad[2]

[ Code ]   | [ Demo Video ]

CMS main

Abstract

As Deep Neural Network models for face processing tasks approach human-like performance, their deployment in critical applications such as law enforcement and access control has seen an upswing, where any failure may have far-reaching consequences. We need methods to build trust in deployed systems by making their working as transparent as possible. Existing visualization algorithms are designed for object recognition and do not give insightful results when applied` to the face domain. In this work, we present `Canonical Saliency Maps', a new method which highlights relevant facial areas by projecting saliency maps onto a canonical face model. We present two kinds of Canonical Saliency Maps: image-level maps and model-level maps. Image-level maps highlight facial features responsible for the decision made by a deep face model on a given image, thus helping to understand how a DNN made a prediction on the image. Model-level maps provide an understanding of what the entire DNN model focuses on in each task, and thus can be used to detect biases in the model. Our qualitative and quantitative results show the usefulness of the proposed canonical saliency maps, which can be used on any deep face model regardless of the architecture.


Demo


Related Publications

Canonical Saliency Maps: Decoding Deep Face Models
Thrupthi Ann John, Vineeth N Balasubramanian and C. V. Jawahar
Canonical Saliency Maps: Decoding Deep Face Models , IEEE Transactions in Biometrics, Behavior and Identity Science 2021 Volume 3, Issue 4. [ PDF ] , [ BibTeX ]

Contact

For any queries about the work, please contact the authors below

  1. Thrupthi Ann John - thrupthi [dot] ann [at] research [dot] iiit [dot] ac [dot] in

Classroom Slide Narration System


Jobin K.V., Ajoy Mondal, and C.V. Jawahar

IIIT Hyderabad       {jobin.kv@research., ajoy.mondal@, and jawahar@}iiit.ac.in

[ Code ]   | [ Demo Video ]   | [ Dataset ]

banner style3

The architecture of the proposed cssnet for classroom slide seg-mentation. The network consists of three modules — (i) attention module (up-per dotted region), (ii) multi-scale feature extraction module (lower region), (iii)feature concatenation module.

Abstract

Slide presentations are an effective and efficient tool used by the teaching community for classroom communication. However, this teaching model can be challenging for the blind and visually impaired (VI) students. The VI student required a personal human assistance for understand the presented slide. This shortcoming motivates us to design a Classroom Slide Narration System (CSNS) that generates audio descriptions corresponding to the slide content. This problem poses as an image-to-markup language generation task. The initial step is to extract logical regions such as title, text, equation, figure, and table from the slide image. In the classroom slide images, the logical regions are distributed based on the location of the image. To utilize the location of the logical regions for slide image segmentation, we propose the architecture, Classroom Slide Segmentation Network (CSSN). The unique attributes of this architecture differs from most other semantic segmentation networks. Publicly available benchmark datasets such as WiSe and SPaSe are used to validate the performance of our segmentation architecture. We obtained 9.54% segmentation accuracy improvement in WiSe dataset. We extract content (information) from the slide using four well-established modules such as optical character recognition (OCR), figure classification, equation description, and table structure recognizer. With this information, we build a Classroom Slide Narration System (CSNS) to help VI students understand the slide content. The users have given better feedback on the quality output of the proposed CSNS in comparison to existing systems like Facebook's Automatic Alt-Text (AAT) and Tesseract.

Paper

  • Paper
    Classroom Slide Narration System

    Jobin K.V., Ajoy Mondal, and Jawahar C.V.
    Classroom Slide Narration System , CVIP, 2021.
    [PDF ] |

    Updated Soon

 

Contact

  1. Jobin K.V. - This email address is being protected from spambots. You need JavaScript enabled to view it.
  2. Ajoy Mondal - This email address is being protected from spambots. You need JavaScript enabled to view it.
  3. Jawahar C.V. - This email address is being protected from spambots. You need JavaScript enabled to view it.

3DHumans

A Rich 3D Dataset of Scanned Humans

 

00

 

About

SHARP-3DHumans dataset provides around 250 meshes of people in diverse body shapes in various garments styles andsizes. We cover a wide variety of clothing styles, ranging from loose robed clothing, like saree (a typical South-Asian dress) to relatively tight fit clothing, like shirt and trousers. The dataset consists of around 150 male and 50 female unique subjects. Total male scans are around 180 and total female scans are around 70. In terms of regional diversity, for the first time, we capture body shape,appearance and clothing styles for South-Asian population. For each 3D mesh, pre-registered SMPL body is also included.

Quality

vis

The dataset is collected using Artec3D Eva hand held structured light scanner. The scanner has 3D point accuracy up to 0.1 mm and 3D reso-lution of 0.5 mm, enabling capture of high frequency geometrical details, alogwith high resolution texture maps. The subjects were scanned in a studio environment with controlled lighting and uniform illumination.

Technical Paper

[To be due...]

 

Download Video

Please click here to download

Download Sample

Please click here to download

Handwritten Text Retrieval from Unlabeled Collections


Santhoshini Gongidi and C.V. Jawahar

[ Paper ]   | [ Demo ]

Abstract

Handwritten documents from communities like cultural heritage, judiciary, and modern journals remain largely unexplored even today. To a great extent, this is due to the lack of retrieval tools for such unlabeled document collections. In this work, we consider such collections and present a simple, robust retrieval framework for easy information access. We achieve retrieval on unlabeled novel collections through invariant features learnt for handwritten text. These feature representations enable zero-shot retrieval for novel queries on unexplored collections. We improve the framework further by supporting search via text and exemplar queries. Four new collections written in English, Malayalam, and Bengali are used to evaluate our text retrieval framework. These collections comprise 2957 handwritten pages and over 300K words. We report promising results on these collections, despite the zero-shot constraint and huge collection size. Our framework allows the addition of new collections without any need for specific finetuning or labeling. Finally, we also present a demonstration of the retrieval framework.


Demo link: HW-Search


Teaser Video:


Related Publications

Santhoshini Gongidi, C V Jawahar, Handwritten Text Retrieval from Unlabeled Collections, CVIP 2021

Contact

For any queries about the work, please contact the authors below

  1. Santhoshini Gongidi: This email address is being protected from spambots. You need JavaScript enabled to view it.