Document Enhancement Using Text Specific Prior

Jyotirmoy Banerjee

Document images are often obtained by digitizing paper documents like books or manuscripts. They could be poor in appearance due to degradation of paper quality, spreading and flaking of ink toner, imaging artifacts etc. All the above phenomena lead to different types of noise at the word level including boundary erosion, dilation, cuts/breaks and merges of characters. Further, with the advent of modern electronic gadgets like PDAs, cellular phones, and digital cameras, the scope of document imaging has widened. Document image analysis systems are becoming increasingly visible in everyday life. For instance, one may be interested in systems that process, store, understand document images obtained by cellular phones. Processing challenges in this class of documents are considerably different from the conventional scanned document images. Many of this new class of documents are characterized by low resolution and poor quality. Super resolution provides an algorithmic solution to the resolution enhancement problem by exploiting the image-specific apriori information. In this thesis we study and propose new methods for restoration and resolution enhancement of document images. We presents a single image super-resolution algorithm for gray level document images without using any training set. Super-resolution of document images is characterized by bimodality, smoothness along the edges as well as subsampling consistency. These characteristics are enforced in a Markov Random Field (MRF) framework by defining an appropriate energy function. In our case, subsampling of super-resolution image will return the original low-resolution one, proving the correctness of the method. The restored image, is generated by iteratively reducing the energy function of the MRF, which is a nonlinear optimization problem. This approach is a single frame approach and is useful when you do not have multiple low-resolution images. Document images have repetitive structural nature as the characters and words are found more than once in a page/book. The extraction of a single high-quality text image from a set of degraded images is benefited from the apriori information. A character segmentation is performed to extract the characters. A total variation based prior model is used in a Maximum A Posteriori (MAP) estimate, to smoothen the edges and preserve the corners, so characteristic of text images. Dependence on character segmentation still remains a bottle-neck. Character segmentation problem is not a completely solved problem. The segmentation accuracy depends on the quantity of noise in the text image. In our next approach, we shall overcome the dependency on character segmentation. We shall look for a restoration approach that does not perform a explicit character segmentation, but still uses the repetitive component nature of document images. In document images degradation is varied at different places in a document. Context plays an important role in textual image understanding. A MRF framework that exploits the contextual relation between image patches, is proposed. Using the topological/spacial constraints between the image patches, the impossible combinations are eliminated from the initial set of matchings, resulting in an unambiguous textual output. The local consistency is adjusted to the global consistency using the belief propagation algorithm. As we are working with patches and not characters, we avoid performing an explicit segmentation. The ability to work with larger patch sizes allows us to deal with severe degradations including cuts, blobs, merges and vandalized documents. This approach can also integrate document restoration and super-resolution into a single framework, thus directly generating high quality images from degraded documents. To conclude, the thesis presents an approach for reconstructing document images. Unlike other conventional reconstruction methods, the unknown pixel values are not estimated based on their local surrounding neighbourhood, but on the whole image. We exploit the multiple occurrence of characters in the scanned document. A great advantage of our proposed approach over conventional approaches is that we have more information at our disposal, which leads to a better enhancement of the document image. Experimental results show significant improvement in image quality on document images collected from various sources including magazines and books, comprehensively demonstrate the robustness and adaptability of the approach.


Year of completion:  2009
 Advisor : C. V. Jawahar

Related Publications

  • Jyotirmoy Banerjee, Anoop M. Namboodiri and C.V. Jawahar - Contextual Restoration of Severely Degraded Document Images Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR 09), 20-25 June, 2009, Miami Beach, Florida, USA. [PDF]

  • Jyotirmoy Banerjee, and C.V. Jawahar - Super-resolution of Text Images Using Edge-Directed Tangent Field in Eighth IAPR Workshop on Document Analysis Systems (DAS), Sep 17-19, 2008, Nara, Japan. [PDF] [Received Honorable Mentions AWARD]

  • Jyotirmoy Banerjee, and C. V. Jawahar - Restoration of Document Images Using Bayesian Inference in National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), Jan 11-13, 2008, Gandhinagar, Gujarat, India. [PDF]