The IIIT-CFW dataset

 

pro page


About

The IIIT-CFW is database for the cartoon faces in the wild. It is harvested from Google image search. Query words such as Obama + cartoon, Modi + cartoon, and so on were used to collect cartoon images of 100 public figures. The dataset contains 8928 annotated cartoon faces of famous personalities of the world with varying profession. Additionally, we also provide 1000 real faces of the public figure to study cross modal retrieval tasks, such as, Photo2Cartoon retrieval. The IIIT-CFW can be used for the study spectrum of problems as discussed in our paper.


Keywords: Cartoon faces, face synthesis, sketches, heterogeneous face recognition, cross modal retrieval, caricature.


Downloads

IIIT-CFW (133.5 MB)
Usage:
Task 1 -- cartoon face classification
Task 2 -- Photo2Cartoon
README

 

Older version:

IIIT-CFW (85 MB)

Updates


Related Publications

 

  • Ashutosh Mishra, Shyam Nandan Rai, Anand Mishra and C. V. Jawahar, IIIT-CFW: A Benchmark Database of Cartoon Faces in the Wild, 1st workshop on visual analysis and sketch (ECCVW) 2016. [PDF]


Bibtex

If you use this dataset, please cite:

@InProceedings{MishraECCV16,
  author    = "Mishra, A., Nandan Rai, S., Mishra, A. and Jawahar, C.~V.",
  title     = "IIIT-CFW: A Benchmark Database of Cartoon Faces in the Wild",
  booktitle = "VASE ECCVW",
  year      = "2016",
}

Contact

For any queries about the dataset feel free to contact Anand Mishra. Email:This email address is being protected from spambots. You need JavaScript enabled to view it.

Visual Aesthetic Analysis for Handwritten Document Images


Abstract

We  present  an  approach  for  analyzing  the  visual aesthetic  property  of  a  handwritten  document  page  which matches  with  human  perception.  We  formulate  the  problem at two independent levels: (i) coarse level which deals with the overall layout, space usages between lines, words and margins, and (ii) fine level, which analyses the construction of each word and  deals  with  the  aesthetic  properties  of  writing  styles.  We present our observations on multiple local and global features which can extract the aesthetic cues present in the handwritten documents.

Project Page Flowchart 2.0


Major Contributions

  • A novel approach for determining the aesthetic value of Handwritten Document Images.
  • Identified relevant features helpful in quantifying the human notion for aesthetics of handwriting.
  • Dataset of 275 pages of handwritten document images along with their neatness annotations.

 


Qualitative Results

img2

Qualitative analysis of the proposed method in predicting the aesthetic measure of the document. (a) The top scoring document images, having properly aligned and beautiful handwritten text. (b) The lowest scoring document images where one can observe inconsistent word spacing, skew and highly irregular word formation. (c) Sample pairs of word images from the human verification experiment where the words in the first column are predicted better than the words in the second column whereas the third column denotes whether the prediction agrees with human judgment.


Related Publications

 

  • Anshuman Majumdar, Praveen Krishnan and C.V. Jawahar - Visual Aesthetic Analysis for Handwritten Document Images,15th International Conference on Frontiers in Handwriting Recognition, Shenzhen, China (ICFHR), 2016. [PDF]


Codes

Coming Soon...


Associated People

  • Anshuman Majumdar
  • Praveen Krishnan
  • C.V. Jawahar

Information Retrieval from Large Document Image Collections


Motivation

 

MotivationImage

 

We focuses on the challenging problem of Information Retrieval from large document image collections. We propose to develop algorithms and approaches that are scalable to large datasets. We use and extend ideas from machine learning (ML), information retrieval (IR) and computer vision (CV) for this task. Our results are expected to impact the way retrieval is carried out from document images (documents which have textual content and image format). Effective retrieval systems built over textual content (often crawled from web) have changed the way we look at multimedia collections. Since we work on images, traditional IR solutions are not directly applicable. Popular approach is to recognize (e.g. with an OCR) the images and build a textual representation. However recognizers can be brittle and result in noisy outputs in many practical settings (e.g. historic documents, handwritten documents, Indian language documents etc.). We design representations that can scale to millions of document images seamlessly from a small corpus of annotated datasets.


Word Image Retrieval using Bag of Visual Words

 

BoVW

 

 

In this work, we present a Bag of Visual Words (BoVW) based approach to retrieve similar word images from a large database, efficiently and accurately. We show that a text retrieval system can be adapted to build a word image retrieval solution. This helps in achieving scalability. We demonstrate the method on more than 1 Million word images with a sub-second retrieval time. We validate the method on four Indian languages, and report a mean average precision of more than 0.75. To address the lack of spatial structure in the BoVW representation, we re-rank the retrieved list.

 

Key Features

  • Language independent system : Demonstrated on 5 different languages.
  • Scalable to huge datasets : Demonstrated on 1 Million images.
  • Handles noisy document images : Demonstrated on dataset for which Commercial OCRs fail.

Related Publications

  • Ravi Sekhar and C V Jawahar - Word Image Retrieval Using Bag of Visual words Proceedings of 10th IAPR International Workshop on Document Analysis Systems 27-29 Mar. 2012, ISBN 978-1-4673-0868-7, pp. 297-301, Queensland, Australia. [PDF] [Poster] [bibtex]

 


 

Content Level Access to Digital Library of India Pages

 

In this work, we propose a framework for content level access to the scanned pages of Digital Library of India (DLI). We propose a search scheme which fuses noisy OCR output and holistic visual features for content level access to the DLI pages. Visual content is captured using Bag of Visual Words (BoVW) approach. The fusion scheme improves over the individual methods in terms of mean Average Precision (mAP) and mean precision at 10 (mPrec@10). We exploit the fact that OCR has a high precision while BoVW has a high recall.

Digital Library of India

Digital Library of India (DLI) has emerged as one of the largest collections of document images in Indian scripts. DLI, as a part of Million Book Project (MBP), has contributed to the free access of knowledge to Billions of people. In addition, it also helped in digitally archiving the rare and precious books in many of the Indian languages. All these digital contents are stored as scanned images of printed documents. A major challenge presently faced bythe DLI is the lack of content level access to the individual pages.

 


 

Results


 

 


Associated People

 

Deep Feature Embedding for Accurate Recognition and Retrieval of Handwritten Text


Abstract

We propose a deep convolutional feature representation that achieves superior performance for word spotting and recognition for handwritten images. We focus on :- (i) enhancing the discriminative ability of the convolutional features using a reduced feature representation that can scale to large datasets, and (ii) enabling query-by-string by learning a common subspace for image and text using the embedded attribute framework. We present our results on popular datasets such as the IAM corpus and historical document collections from the Bentham and George Washington pages. On the challenging IAM dataset, we achieve a state of the art mAP of 91.58% on word spotting using textual queries and a mean word error rate of 6.69% for the word recognition task.

 

Motivation MatchingHW 2


Major Contributions

  • Improve the discriminative ability of deep CNN features using HWNet[1] and achieving state of the art results for handwritten word spotting and recognition tasks on IAM, George Washington and Bentham handwritten pages.
  • Learning a reduced feature representation that can scale large datasets.
  • Enabling query-by-string by learning a common subspace for image and text using the embedded attribute framework.

Qualitative Results

qualResults


Related Publications

  • Praveen Krishnan,  Kartik Dutta and C.V Jawahar - Deep Feature Embedding for Accurate Recognition and Retrieval of Handwritten Text, 15th International Conference on Frontiers in Handwriting Recognition, Shenzhen, China (ICFHR), 2016. [PDF]


Codes

Link

 

Models

Pre Trained CNN Models

Pre Trained Attribute Models


References

[1] Praveen Krishnan and C.V Jawahar, "Matching Handwritten Document Images", ECCV 2016.

[2] J. Almazán, A. Gordo, A. Fornés, and E. Valveny, "Word spotting and recognition with embedded attributes," PAMI,2014.

[3] Praveen Krishnan and C.V Jawahar, "Generating Synthetic Data for Text Recognition", arXiv:1608.04224, 2016

 

Associated People

  • Praveen Krishnan *
  • Kartik Dutta *
  • C.V. Jawahar

* Equal Contribution

Matching Handwritten Document Images


Abstract

We address the problem of predicting similarity between a pair of handwritten document images written by different individuals. This has applications related to matching and mining in image collections containing handwritten content. A similarity score is computed by detecting patterns of text re-usages between document images irrespective of the minor variations in word morphology, word ordering, layout and paraphrasing of the content. Our method does not depend on an accurate segmentation of words and lines. We formulate the document matching problem as a structured comparison of the word distributions across two document images. To match two word images, we propose a convolutional neural network based feature descriptor. Performance of this representation surpasses the state-of-the-art on handwritten word spotting. Finally, we demonstrate the applicability of our method on a practical problem of matching handwritten assignments.


Problem Statement

In this work, we compute a similarity score by detecting patterns of text re-usages across documents written by different individuals irrespective of the minor variations in word forms, word ordering, layout or paraphrasing of the content.

 

Motivation MatchingHW

 

 

Major Contribution:

  • To address the lack of data for training handwritten word images, we build a synthetic handwritten dataset of 9 million word images referred as IIIT-HWS dataset.
  • We report a 56% error reduction in word spotting task on the challenging dataset of IAM and pages from George Washington collection.
  • We also propose a normalized feature representation for word images which is invariant to different inflectional endings or suffixes present in words.
  • We demonstrate two immediate applications (i) searching handwritten text from instructional videos, and (ii) comparing handwritten assignments.

IIIT-HWS dataset

 

IIIT HWS

Generating synthetic images is an art which emulates the natural process of image generation in a closest possible manner. In this work, we exploit such a framework for data generation in handwritten domain. We render synthetic data using open source fonts and incorporate data augmentation schemes. As part of this work, we release 9M synthetic handwritten word image corpus which could be useful for training deep network architectures and advancing the performance in handwritten word spotting and recognition tasks.

Download link:

Description Download Link File Size
Readme file Readme 4.0 KB
IIIT-HWS image corpus IIIT-HWS 32 GB
Ground truth files GroundTruth 229 MB

 

 

Please cite the below paper in case you are using the dataset.

     

    HWNet Architecture

    HWNet

     

     

     

    Measure of document similarity (MODS)

     

    MODS

     


    Datasets and Codes

    Please contact author


    Related Papers

    • Praveen Krishnan and C.V Jawahar - Matching Handwritten Document Images, The 14th European Conference on Computer Vision (ECCV) – Amsterdam, The Netherlands, 2016. [PDF]

     


    Contact