## ETL: Efficient Transfer Learning for Face Tasks

### [ Video ]   | [ PDF ]

Pipeline for efficient transfer of parameters from model trained on primary task like face-recognition to model for secondary task including gender, emotion, head pose and age in one pass. The ETL technique identifies and preserves the task related filters only which in turn results in highly sparse network for efficient training of face related tasks.

## Abstract

Transfer learning is a popular method for obtaining deep trained models for data-scarce face tasks such as head pose and emotion. However, current transfer learning methods are inefficient and time-consuming as they do not fully account for the relationships between related tasks. Moreover, the transferred model is large and computationally expensive. As an alternative, we propose ETL: a technique that efficiently transfers a pre-trained model to a new task by retaining only \emph{cross-task aware filters}, resulting in a sparse transferred model. We demonstrate the effectiveness of ETL by transferring VGGFace, a popular face recognition model to four diverse face tasks. Our experiments show that we attain a size reduction up to 97\% and an inference time reduction up to 94\% while retaining 99.5\% of the baseline transfer learning accuracy.

## Related Publications

##### ETL: Efficient Transfer Learning for Face tasks

Thrupthi Ann John, Isha Dua, Vineeth N Balasubramanian and C. V. Jawahar
ETL: Efficient Transfer Learning for Face Tasks , 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2022.  [ PDF ] , [ BibTeX ]

## Contact

1. Thrupthi Ann John - thrupthi [dot] ann [at] research [dot] iiit [dot] ac [dot] in
2. Isha Dua: duaisha1994 [at] gmail [dot] com

## Abstract

As Deep Neural Network models for face processing tasks approach human-like performance, their deployment in critical applications such as law enforcement and access control has seen an upswing, where any failure may have far-reaching consequences. We need methods to build trust in deployed systems by making their working as transparent as possible. Existing visualization algorithms are designed for object recognition and do not give insightful results when applied to the face domain. In this work, we present Canonical Saliency Maps', a new method which highlights relevant facial areas by projecting saliency maps onto a canonical face model. We present two kinds of Canonical Saliency Maps: image-level maps and model-level maps. Image-level maps highlight facial features responsible for the decision made by a deep face model on a given image, thus helping to understand how a DNN made a prediction on the image. Model-level maps provide an understanding of what the entire DNN model focuses on in each task, and thus can be used to detect biases in the model. Our qualitative and quantitative results show the usefulness of the proposed canonical saliency maps, which can be used on any deep face model regardless of the architecture.

## Related Publications

Canonical Saliency Maps: Decoding Deep Face Models
Thrupthi Ann John, Vineeth N Balasubramanian and C. V. Jawahar
Canonical Saliency Maps: Decoding Deep Face Models , IEEE Transactions in Biometrics, Behavior and Identity Science 2021 Volume 3, Issue 4. [ PDF ] , [ BibTeX ]

## Contact

1. Thrupthi Ann John - thrupthi [dot] ann [at] research [dot] iiit [dot] ac [dot] in

## Classroom Slide Narration System

### [ Code ]   | [ Demo Video ]   | [ Dataset ]

The architecture of the proposed cssnet for classroom slide seg-mentation. The network consists of three modules — (i) attention module (up-per dotted region), (ii) multi-scale feature extraction module (lower region), (iii)feature concatenation module.

## Abstract

Slide presentations are an effective and efficient tool used by the teaching community for classroom communication. However, this teaching model can be challenging for the blind and visually impaired (VI) students. The VI student required a personal human assistance for understand the presented slide. This shortcoming motivates us to design a Classroom Slide Narration System (CSNS) that generates audio descriptions corresponding to the slide content. This problem poses as an image-to-markup language generation task. The initial step is to extract logical regions such as title, text, equation, figure, and table from the slide image. In the classroom slide images, the logical regions are distributed based on the location of the image. To utilize the location of the logical regions for slide image segmentation, we propose the architecture, Classroom Slide Segmentation Network (CSSN). The unique attributes of this architecture differs from most other semantic segmentation networks. Publicly available benchmark datasets such as WiSe and SPaSe are used to validate the performance of our segmentation architecture. We obtained 9.54% segmentation accuracy improvement in WiSe dataset. We extract content (information) from the slide using four well-established modules such as optical character recognition (OCR), figure classification, equation description, and table structure recognizer. With this information, we build a Classroom Slide Narration System (CSNS) to help VI students understand the slide content. The users have given better feedback on the quality output of the proposed CSNS in comparison to existing systems like Facebook's Automatic Alt-Text (AAT) and Tesseract.

## Paper

• ##### Classroom Slide Narration System

Jobin K.V., Ajoy Mondal, and Jawahar C.V.
Classroom Slide Narration System , CVIP, 2021.
[PDF ] |

Updated Soon

## Contact

1. Jobin K.V. - This email address is being protected from spambots. You need JavaScript enabled to view it.
2. Ajoy Mondal - This email address is being protected from spambots. You need JavaScript enabled to view it.
3. Jawahar C.V. - This email address is being protected from spambots. You need JavaScript enabled to view it.

## A Rich 3D Dataset of Scanned Humans

SHARP-3DHumans dataset provides around 250 meshes of people in diverse body shapes in various garments styles andsizes. We cover a wide variety of clothing styles, ranging from loose robed clothing, like saree (a typical South-Asian dress) to relatively tight fit clothing, like shirt and trousers. The dataset consists of around 150 male and 50 female unique subjects. Total male scans are around 180 and total female scans are around 70. In terms of regional diversity, for the first time, we capture body shape,appearance and clothing styles for South-Asian population. For each 3D mesh, pre-registered SMPL body is also included.

## Quality

The dataset is collected using Artec3D Eva hand held structured light scanner. The scanner has 3D point accuracy up to 0.1 mm and 3D reso-lution of 0.5 mm, enabling capture of high frequency geometrical details, alogwith high resolution texture maps. The subjects were scanned in a studio environment with controlled lighting and uniform illumination.

[To be due...]

## Abstract

Handwritten documents from communities like cultural heritage, judiciary, and modern journals remain largely unexplored even today. To a great extent, this is due to the lack of retrieval tools for such unlabeled document collections. In this work, we consider such collections and present a simple, robust retrieval framework for easy information access. We achieve retrieval on unlabeled novel collections through invariant features learnt for handwritten text. These feature representations enable zero-shot retrieval for novel queries on unexplored collections. We improve the framework further by supporting search via text and exemplar queries. Four new collections written in English, Malayalam, and Bengali are used to evaluate our text retrieval framework. These collections comprise 2957 handwritten pages and over 300K words. We report promising results on these collections, despite the zero-shot constraint and huge collection size. Our framework allows the addition of new collections without any need for specific finetuning or labeling. Finally, we also present a demonstration of the retrieval framework.

Teaser Video:

## Related Publications

Santhoshini Gongidi, C V Jawahar, Handwritten Text Retrieval from Unlabeled Collections, CVIP 2021