CVIT Projects

Abstract

Person recognition methods that use multiple body regions have shown significant improvements over traditional face-based recognition. One of the primary challenges in full-body person recognition is the extreme variation in pose and view point. In this work, (i) we present an approach that tackles pose variations utilizing multiple models that are trained on specific poses, and combined using pose-aware weights during testing. (ii) For learning a person representation, we propose a network that jointly optimizes a single loss over multiple body regions. (iii) Finally, we introduce new benchmarks to evaluate person recognition in diverse scenarios and show significant improvements over previously proposed approaches on all the benchmarks including the photo album setting of PIPA.

Links

Paper

Datasets

PIPA: Dataset Pose Annotations

IMDB: Dataset

Hannah: Dataset

Soccer: Dataset

Software

code models

Citation

@InProceedings{vijaycvpr15,
  author    = "Vijay Kumar and Anoop Namboodiri and and Manohar Paluri and Jawahar, C.~V.",
  title     = "Pose-Aware Person Recognition",
  booktitle = "Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition",
  year      = "2017"
}

References

1. N. Zhang et al., Beyond Fronta Faces: Improving Person Recognition using Multiple Cues, CVPR 2014.

2. Oh et al., Person Recognition in Personal Photo Collections, ICCV 2015.

3. Li et al., A Multi-lvel Contextual Model for Person Recognition in Photo Albums, CVPR 2016.

4. Ozerov et al., On Evaluating Face Tracks in Movies, ICIP 2013.

Acknowledgements

Vijay Kumar is partly supported by TCS PhD Fellowship 2012.

Contact

For any comments and suggestions, please email Vijay at This email address is being protected from spambots. You need JavaScript enabled to view it.

Copyright Notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright.

From Traditional to Modern: Domain Adaptation for Action Classification in Short Social Video Clips

Abstract

Short internet video clips like vines present a significantly wild distribution compared to traditional video datasets. In this paper, we focus on the problem of unsupervised action classification in wild vines using traditional labeled datasets. To this end, we use a data augmentation based simple domain adaptation strategy. We utilise semantic word2vec space as a common subspace to embed video features from both, labeled source domain and unlabelled target domain. Our method incrementally augments the labeled source with target samples and iteratively modifies the embedding function to bring the source and target distributions together. Additionally, we utilise a multi-modal representation that incorporates noisy semantic information available in form of hash-tags. We show the effectiveness of this simple adaptation technique on a test set of vines and achieve notable improvements in performance.

Challenges

The distribution of video are targeting is vine.co. These are recorded by the users under unconstrained environment. These videos contain significant camera shakes, lighting variability, abrupt shots etc. The challenge is to utilise an existing dataset for action videos to significantly gather relevant vines without investing manual labour.

Another set of challenge is to merge visual, textual and hash-tag information of a vine to perform the above stated segregation.

Contribution

We attempt to solve the problem of classifying action in vines by adapting classifiers trained for source dataset to target dataset. This we perform by iteratively selecting high confidence videos and modifying the learnt embedding function.
We also provide the 3000 vine videos used in our work along with their hash-tags as provided by the uploaders.

flowchart gcpr page 001

Dataset

Name	Link	Description
Code.tar	Link	Code for running the main classification along with other utility programs
Vine.tar	Link	Download link for vines
Paper.pdf	Link	GCPR submission
Supplementary	Link	Supplementary Submission

Results

perfTable

Associated People

Learning to Hash-tag Videos with Tag2Vec

Abstract

User-given tags or labels are valuable resources for semanticunderstanding of visual media such as images and videos. Recently, a new type of labeling mechanism known as hashtagshave become increasingly popular on social media sites. In this paper, we study the problem of generating relevantand useful hash-tags for short video clips. Traditional data driven approaches for tag enrichment and recommendationuse direct visual similarity for label transfer and propagation. We attempt to learn a direct low-cost mapping fromvideo to hash-tags using a two step training. We first employa natural language processing (NLP) technique, Skiagram models with neural network training to learn a low dimensional vector representation of hash-tags (Tag2Vec)using a corpus of ∼10 million hash-tags. We then trainan embedding function to map video features to the low dimensional Tag2vec space. We learn this embedding for 29categories of short video clips withhash-tags. A query videowithout any tag-information can then be directly mappedto the vector space of tags using the learned embedding andrelevant tags can be found by performing a simple nearestneighbor retrieval in the Tag2Vec space. We validate therelevance of the tags suggested by our system qualitativelyand quantitatively with user study.

Aim

The distribution of video we are targeting is vine.co. These are recorded by the users under unconstrained environment. These videos contain significant camera shakes, lighting variability, abrupt shots etc. We try to use the folksonomies associated by the uploaders to create a vector space. This Tag2Vec space is trained using ~2million hash tags downloaded for ~15000 categories. The main motivation to create a plug and use system for categorising vines.

Contribution

We create a system for easily tagging vines
We provide training sentences comprised of hashtags, and also vines+hash tags for test categories

Dataset

Name	Link	Description
Code.tar	Link	Code for running the main classification along with other utility programs
Vine.tar	Link	Download link for vines
HashTags.tar	Link	Training Hashtags
Paper.pdf	Link	GCPR submission
Supplementary	Link	Supplementary Submission

Approach

Qualitative Results

We conduct a user study where a user is presented with a video and 15 suggested tags by our system. The user marks the relevant tags, in the end we compute the average number of relevant tags across classes.

Associated People

The IIIT-CFW dataset

pro page

About

The IIIT-CFW is database for the cartoon faces in the wild. It is harvested from Google image search. Query words such as Obama + cartoon, Modi + cartoon, and so on were used to collect cartoon images of 100 public figures. The dataset contains 8928 annotated cartoon faces of famous personalities of the world with varying profession. Additionally, we also provide 1000 real faces of the public figure to study cross modal retrieval tasks, such as, Photo2Cartoon retrieval. The IIIT-CFW can be used for the study spectrum of problems as discussed in our paper.

Keywords: Cartoon faces, face synthesis, sketches, heterogeneous face recognition, cross modal retrieval, caricature.

Downloads

IIIT-CFW (133.5 MB)
Usage:
Task 1 -- cartoon face classification
Task 2 -- Photo2Cartoon
README

Older version:

IIIT-CFW (85 MB)

Updates

Related Publications

Ashutosh Mishra, Shyam Nandan Rai, Anand Mishra and C. V. Jawahar, IIIT-CFW: A Benchmark Database of Cartoon Faces in the Wild, 1st workshop on visual analysis and sketch (ECCVW) 2016. [PDF]

Bibtex

If you use this dataset, please cite:

@InProceedings{MishraECCV16,
  author    = "Mishra, A., Nandan Rai, S., Mishra, A. and Jawahar, C.~V.",
  title     = "IIIT-CFW: A Benchmark Database of Cartoon Faces in the Wild",
  booktitle = "VASE ECCVW",
  year      = "2016",
}

Contact

For any queries about the dataset feel free to contact Anand Mishra. Email:This email address is being protected from spambots. You need JavaScript enabled to view it.

Visual Aesthetic Analysis for Handwritten Document Images

Abstract

We present an approach for analyzing the visual aesthetic property of a handwritten document page which matches with human perception. We formulate the problem at two independent levels: (i) coarse level which deals with the overall layout, space usages between lines, words and margins, and (ii) fine level, which analyses the construction of each word and deals with the aesthetic properties of writing styles. We present our observations on multiple local and global features which can extract the aesthetic cues present in the handwritten documents.

Project Page Flowchart 2.0

Major Contributions

A novel approach for determining the aesthetic value of Handwritten Document Images.
Identified relevant features helpful in quantifying the human notion for aesthetics of handwriting.
Dataset of 275 pages of handwritten document images along with their neatness annotations.

Qualitative Results

Qualitative analysis of the proposed method in predicting the aesthetic measure of the document. (a) The top scoring document images, having properly aligned and beautiful handwritten text. (b) The lowest scoring document images where one can observe inconsistent word spacing, skew and highly irregular word formation. (c) Sample pairs of word images from the human verification experiment where the words in the first column are predicted better than the words in the second column whereas the third column denotes whether the prediction agrees with human judgment.

Related Publications

Anshuman Majumdar, Praveen Krishnan and C.V. Jawahar - Visual Aesthetic Analysis for Handwritten Document Images,15th International Conference on Frontiers in Handwriting Recognition, Shenzhen, China (ICFHR), 2016. [PDF]

Codes

Coming Soon...

Associated People

Anshuman Majumdar
Praveen Krishnan
C.V. Jawahar

Abstract

Links

Datasets

Software

Citation

References

Acknowledgements

Contact

Copyright Notice

From Traditional to Modern: Domain Adaptation for Action Classification in Short Social Video Clips

Abstract

Challenges

Contribution

Dataset

Results

Learning to Hash-tag Videos with Tag2Vec

Abstract

Aim

Contribution

Dataset

Approach

Qualitative Results

The IIIT-CFW dataset

About

Downloads

Related Publications

Bibtex

Contact

Visual Aesthetic Analysis for Handwritten Document Images

Abstract

Major Contributions

Qualitative Results

Related Publications

Codes

More Articles …