The IIIT Scene Text Retrieval (STR) Dataset

[Project Page]



The IIIT STR dataset is harvested from Google and Flickr image search. Query words like coffee shop, motel, post office, high school, department were used to collect the images. Additionally, query words like sky, building were used in Flickr to collect some random distractors (images not containg text). The dataset contains 10,000 images in all. The images are manually annotated to say whether they contain a query word or not. Annotation for all the 50 query words used in our paper is available. Each query word appears 10-50 times in the dataset.




Anand Mishra, Karteek Alahari and C. V. Jawahar.
Image Retrieval using Textual Cues
ICCV 2013 [PDF]


If you use this dataset, please cite:

  author    = "Mishra, A. and Alahari, K. and Jawahar, C.~V.",
  title     = "Image Retrieval using Textual Cues",
  booktitle = "ICCV",
  year      = "2013",

Related datasets


For any queries about the dataset feel free to contact Anand Mishra. Email:This email address is being protected from spambots. You need JavaScript enabled to view it.


Image Retrieval using Textual Cues

Anand Mishra, Karteek Alahari and C V Jawahar

ICCV 2013


We present an approach for the text-to-image retrieval problem based on textual content present in images. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that such an approach, despite being based on state-of-the-art methods, is insufficient, and propose a method, where we do not rely on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M, we introduce.


  • Query driven approach for scene text based image retrieval
  • Scene text indexing without explicit text localization and recognition
  • Category as well as instance retrieval
  • The largest datasets for the problem


ICCV 2013 Paper / Poster


IIIT Scene Text Retrieval (IIIT STR)

Video Scene Text Retrieval Datasets (TV series-1M and Sports-10K)

Extended results



  author    = "Mishra, A. and Alahari, K. and Jawahar, C.~V.",
  title     = "Image Retrieval using Textual Cues",
  booktitle = "Proceedings of IEEE International Conference on Computer Vision",
  year      = "2013"


Anand Mishra is partly supported by MSR India PhD Fellowship 2012.

Copyright Notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright.

Culling an Object Hierarchy to a Frustum Hierarchy


Visibility culling of a scene is central to any interactive graphics application. The idea is to limit the geometry sent down the rendering pipeline to only the geometry with a fair chance of finally becoming visible. It is important for the culling stage to be fast for it to be effective; otherwise the performance gain achieved will be overshadowed. Hierarchical scene structures are commonly used to speed up the process. Hierarchical culling of bounding boxes to a view frustum is fast and sufficient in most applications. However, when there are multiple view frustums (as in a tiled display wall), visibility culling time becomes substantial and cannot be hidden by pipelining it with other stages of rendering.

Here, we address the problem of culling an object hierarchy to a hierarchically organized set of frustums, such as those found in tiled displays and shadow volume computation. We present an adaptive algorithm to unfold the Object Hierarchy (OH) or Frustum hierarchy (FH) at every stage in the culling procedure. Our algorithm computes from-point visibility and is conservative. The precomputation required is minimal, allowing our approach to be applied for dynamic scenes as well.

Design & Features

Features of the Adaptive OH and FH culling Algo ::fh2

  • Faster running time than existing solutions based etirely on either OH or FH
  • Use of Heirarchies potentially eliminates half objects and frustums in each step
  • Sublinear running time, adapts according to viewpoint and "goodness" of the scene.
  • Simple PCA based preprocessing required to compute Oriented Bounding boxes
  • Can work with OBB's, AABB's or Bounding Spheres
  • Scales sublineraly to to both object and frustum count, suitable for very large number of objects and large tiled displays
  • Dynamic objects can be used as only cheap OBB/AABB computation is needed every frame
  • Easily extendable to a generalized arrangement of view frustums

We first find all the objects in the Primary view frustum and then call the Adaptive algorithm for every object present inside the Primary view frustum. The Adaptive algoritm is recursive in nature. The basic steps involved are::

Algorithm Adaptive Culling. Inputs: Object Node and Frustum Nodealgo

  • If the Frustum node is a leaf node, mark the Object node visible to this frustum and return.
  • [L,C,R] = Classify the children of the Object Node according to their orientation with respect to the Frusum node's bisection plane.
  • Send All objects in set C to both sub-frustums of the Frustum Node using this algorithm again
  • Send All objects in set L to the left sub-frustum using this algorithm again
  • Send All objects in set R to the right sub-frusrum using this algorithm again

The time complexity of the algorithm is roughly of O (min (N*logM, M*logN)) where N = Number of objects and M = Number of view frustums.

Related Publication

  • Nirnimesh, Pawan Harish and P. J. Narayanan - Garuda: A Scalable, Tiled Display Wall Using Commodity PCs Proc. of IEEE Transactions on Visualization and Computer Graphics(TVCG), Vol.13, no.~5, pp.864-877, 2007. [PDF]

  • Nirnimesh, Pawan Harish and P. J. Narayanan - Culling an Object Hierarchy to a Frustum Hierarchy, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCL 4338 pp.252-263,2006. [PDF]

Associated People

IIIT-Seed dataset

IIIT-Seed (Semantic Edges) dataset contains 500 RGBD images of varying complexity. This dataset consists of objects such as tables, chairs, cupboard shelves, boxes and household objects in addition to walls and floors. This dataset is reported in the paper cited below :

Nishit Soni, Anoop M. Namboodiri, CV Jawahar and Srikumar Ramalingam. Semantic Classification of Boundaries of an RGBD Image. In Proceedings of the British Machine Vision Conference (BMVC 2015), pages 114.1-114.12. BMVA Press, September 2015. [paper] [abstract] [poster] [code] [bibtex]

  • Download the dataset : here. Cite our paper if you use this dataset. [bibtex].
  • Test and train files : test split and train split.
  • Download 100 images of NYU dataset used to test our approach on: here.
  • Pb edge links for our dataset as well as the NYU dataset : here.
  • Groundtruth for both the datasets : here.
  • readme.txt

Biometric Authentication


Biometrics deals with recognizing people based on their physiological or behavioral characteristics. Our work primarily concentrates on three different aspects in biometrics:

  1. Enhancing Weak Biometrics for Authentication: Weak biometrics (hand-geometry, face, voice, keystrokes) are the traits that possess low discriminating content and they change over time for each individual. However, there are several traits of weak biometrics such as social acceptability, ease of sensing, and lack of privacy concerns that make weak biometrics ideally suited for civilian applications. Methods that we developed can effectively handle the problems of low discriminative power and low feature stability of weak biometrics, as well as time-varying population in civilian applications.
  2. Writer Identification from Handwritten Documents: Handwriting is a behavioural biometric that contains distinctive traits aquired by a person over time. Traditional approaches to writer identification tries to compute feature vectors that capture traits of handwriting that are known to experts as discriminative. In contrast we concentrate on automatic extraction of features that are suitable to specific applications such as writer identification in civilian domain and in problems such as forgery and repudiation in forensics.
  3. Use of Camera as a Biometric Sensor: Camera has been used for capturing face images for authentication in the past. However, with biometrics traits such as fingerprints and iris, a specialized sensor is often preferred due to the high quality of data that they provide. Recent advances in image sensors have made digital cameras both inexpensive and technically capable for achieving high quality images. However, many problems such as variations in pose, illumination and scale restrict the use of cameras as sensors for many biometric traits. We are working on the use of models of imaging process to overcome these problems, to capture high quality data for authentication.

Enhancing Weak Biometric based Authentication


Weak biometrics (hand-geometry, face, voice, keystrokes) are the traits which possess low discriminating content and they change over time for each individual. Thus they show low accuracy of the system as compared to the strong biometrics (eg. fingerprints, iris, retina, etc.) However, due to exponentially decreasing costs of the hardware and computations, biometrics has found immense use in civilian applications (Time and Attendance Monitoring, Physical Access to Building, Human-Computer Interface, etc.) other than forensics (e.g. criminal and terrorist identification). Various factors need to be considered while selecting a biometric trait for civilian application; most important of which are related to user psychology and acceptability, affordability, etc. Due to these reasons, weak biometric traits are often better suited for civilian applications than the strong biometric traits. In this project, we address issues such as low and unstable discriminating information, which are present in weak biometrics and variations in user population in civilian applications.

schdaDue to the low discriminating content of the weak biometric traits, they show poor performance during verification. We have developed a novel feature selection technique called Single Class Hierarchical Discriminant Analysis (SCHDA), specifically for authentication purpose in biometric systems. SCHDA builds an optimal user-specific discriminant space for each individual where the samples of the claimed identity are well-separated from the samples of all the other users.

The second problem which leads to low accuracy of authentication is the poor stability or permanence of weak biometric traits due to various reasons (eg. ageing, the person gaining or losing weight, etc.) Civilian applications usually operate in cooperative or monitored mode wherein the users can give feedback to the system on occurrence of any errors. An intelligent adaptive framework is used, which uses feedback to incrementally update the parameters of the feature selection and verification framework for each individual.

The third factor that has been explored to improve the performance of an authentication system for civilian applications is the pattern of participation of each enrolled user. As the new users are enrolled into the system, a degradation is observed in performance due to increasing number of users. An interesting observation is that although the number of users enrolled into the system is very high, the number of users who regularly participate in the authentication process is comparatively low. We model the variation in participating population using Markov models. The prior probability of participation of each individual is computed and incorporated into the feature selection framework, providing more relevance to the parameters of regularly participating users. Both the structured and unstructured modes of variation of participation are explored.

Text Independent Writer Identification from Online Handwriting

Handwriting Individuality is a quantitative measure of writer specific information that can be used to identify authorship of the documents and study of comparison of writing habits, evaluation of the significance of their similarities and differences. It is an discrimitive process like fingerprint identification, firearms identification and DNA analysis. Individuality in handwriting lies in the habits that are developed and become consistant to some degree in the process of writing.

Discriminating elements of handwriting lies in various factors such as i) Arrangement, Connections, Constructions, Design, Dimensions, Slant or Slope, Spacings, CLass and choice of allographs, 2) Language styles such as Abbreviation, Commencements and terminations, diacritics and punctuation, line continuity, line quality or fluency, 3) Physical traits such as pen control, pen hold, pen position, pen pressure and writing movement, 4) Consistancy or natural variations and persistance, and 4) Lateral expansion and word proportions.

The framework that we utilize tries to capture the consistent information at various levels and automatically extract discriminative features from them.

Features of our Approach:clusters

  • Text-independent algorithm: Writer can be identified from any text given in underlined script. Comparison of features are not done for the similar charcters.
  • Script dependent framework: Applicablity is verified on different scripts like Devanagiri, Arabic,Roman, Chinese and Hebrew.
  • Use of Online Information: Online data is used for verification purpose. Offline information is also applicable with similar framework with appropriate change in feature extraction.
  • Authentication with small amount of data: Around 12 words in Devanagiri we get accuracy of 87%.

Underlying process of identification:


  • Primitive Definition:

    Primitives are the discrimitive features of handwriting documents. First step is to identify primitive. Primitives can be individuality features like size, shape, distribution of curves in handwritten document. We choose subcharcter level curves as basic primitives

  • Extraction and Representation of primitive:

    Extraction of primitive is done using velocity profile of the stroke shown in the figure. Minimum velocity points are critical points of primitive. Primitives are extracted using size and shape features as shown in diagram.

  • Identification of Consistant Primitives:

    Repeating curves are consitent primitives. To extract consistent curves, unsupervised clustering algorithm is used to cluster them into different groups.

  • Classification:

    Variation in distribution, size and shape of curves in each cluster is used to discriminate writer from other writers.

Related Publications

  • Vandana Roy, C. V. Jawahar: Feature Selection for Hand-Geometry based Person Authentication, in Proceedings of International Conference on Advanced Computing and Communication, Coimbatore, India, Dec. 2005.
  • Vandana Roy, C. V. Jawahar: Hand-Geometry Based Person Authentication Using Incremental Biased Discriminant Analysis, in Proceedings of National Conference on Communications, New Delhi, Jan. 2006.
  • Anoop Namboodiri, Sachin Gupta: Text Independent Writer Identification for online Handwriting, in Proceedings of the International Workshop on Frontiers in Handwriting Recognition (IWFHR'06), La Baule, France, October 2006.
  • Vandana Roy, C. V. Jawahar: Modeling Time-Varying Population for Biometrics, To appear in Proceedings of International Conference on Computing: Theory and Applications, Kolkata, India, March 2007.

Associated People