Modeling Scene Text and Texture by Decomposing into Component Images

Siddharth Kherada (homepage)

Separation of images into its constituent components based on source of data, frequency distribution, or nature of data has been a widely used technique in the field of image processing and computer vision. Many problems are solved by partitioning images into components and working on each component separately. Common examples include breaking down of image into Red, Green and Blue channels/components for ease of representation or into Luminance and Chrominance for better compression. In this thesis, we explore the separation of natural images into appropriate components for the purpose of representation as well as recognition. We first introduce a framework where separation of images into direct and global components helps in modeling of 3D textures. These 3D textures are often described by parametric functions for each pixel, that models the variation in its appearance with respect to varying lighting direction. However, parametric models such as Polynomial Texture Maps(PTMs) tend to smoothen the changes in appearance. Therefore we propose a technique to effectively model natural material surfaces and their interactions with changing light conditions. We show that the direct and global components of an image have different characteristics, and when modeled separately, leads to a more accurate and compact model of the 3D surface texture. Direct component is mainly affected by structural properties of the surface and is therefore deals with phenomena like shadows and specularity, which are sharply varying functions. The global component is used to model overall luminance and color values, a smoothly varying function. For a given lighting position, both components are computed separately and combined to render a new image. This method models sharp shadows and specularities, while preserving the structural relief and surface color. Thus rendered image have enhanced photorealism as compared to images rendered by existing single pixel models such as PTMs.

We then look at separating an image based on its sources of illumination or albedo variations for the purpose of scene text segmentation. Extracting text from scene images is a challenging task due to the variations in color, size, and font of the text and the results are often affected by complex backgrounds, different lighting conditions, shadows and reflections. A robust solution to this problem can significantly enhance the accuracy of scene text recognition algorithms leading to a variety of applications such as scene understanding, automatic localization and navigation, and image retrieval. We propose a method to extract and binarize text from images that contains complex background. We use Independent Component Analysis (ICA) to map out the text region, which is inherently uniform in nature, while removing shadows, specularity and reflections, which are included in the background. The technique identifies the text regions from the components extracted by ICA using a simple global thresholding method to isolate the foreground text. We show the results of our algorithm on some of the most complex word images from the ICDAR 2003 Robust Word Recognition Dataset and compare with previously reported methods.


Year of completion:  December 2014
 Advisor : Anoop M. Namboodiri


Related Publications

  • Siddharth Kherada and Anoop M Namboodiri - An ICA based Approach for Complex Color Scene Text Binarization Proceedings of the 2nd Asian Conference Pattern Recognition, 05-08 Nov. 2013, Okinawa, Japan. [PDF]

  • Siddharth Kherada, Prateek Pandey, Anoop M. Namboodiri - Improving Realism of 3D Texture using Component Based Modeling Proceedings of IEEE Workshop on Applications of Computer Vision 9-11 Jan. 2012, ISSN 1550-5790 E-ISBN 978-1-4673-0232-6, Print ISBN 978-1-4673-0233-3, pp. 41-47, Breckenridge, CO, USA. [PDF]