We present an approach for the text-to-image retrieval
problem based on textual content present in images. Given
the recent developments in understanding text in images, an
appealing approach to address this problem is to localize
and recognize the text, and then query the database, as in a
text retrieval problem. We show that such an approach, de-
spite beingbased on state-of-the-artmethods, is insuffici
ent,
and propose a method, where we do not rely on an exact lo-
calization and recognition pipeline. We take a query-drive
n
search approach, where we find approximate locations of
characters in the text query, and then impose spatial con-
straints to generate a ranked list of images in the database.
The retrieval performance is evaluated on public scene text
datasets as well as three large datasets, namely IIIT scene
text retrieval, Sports-10K and TV series-1M, we introduce