CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Recognition and Retrieval from Document Image Collections


Million Meshesha

The present growth of digitization of books and manuscripts demands an immediate solution to access them electronically. This will enable the archived valuable materials to be searchable and usable by users in order to achieve their objectives. This requires research in the area of document image understanding, specifically in the area of document image recognition as well as document image retrieval. In the last three decades significant advancement is made in the recognition of documents written in Latin-based and some Oriental scripts. There are many excellent attempts in building robust document analysis systems in industry, academia and research labs. Intelligent recognition systems are commercially available for use for certain scripts. However, there is only limited research effort for the recognition of indigenous scripts of African and Indian languages. In addition, diversity of archived printed documents poses many challenges to document analysis and understanding. Hence in this work, we explore novel approaches for understanding and accessing the content of document image collections that vary in quality and printing.

In Africa around 2,500 languages are spoken. Some of these languages have their own indigenous scripts in which there is a bulk of printed documents available in the various institutions. Digitization of these documents enables us to harness already available language technologies to local information needs and developments. We present an OCR for converting digitized documents in Amharic language.Amharic is the official language of Ethiopia. Extensive literature survey reveals that this is the first attempt that reports the challenges toward the recognition of indigenous African scripts and a possible solution for Amharic script. Research in the recognition of Amharic script faces major challenges due to (i) the use of large number of characters in the writing and (ii) the existence of large set of visually similar characters. Here we extract a set of optimal discriminant features to train the classifier. Recognition results are presented on real-life degraded documents such as books, magazines and newspapers to demonstrate the performance of the recognizer.

The present OCRs are typically designed to work on a single page at a time. We argue that the recognition scheme for a collection (like a book) could be considerably different from that designed for isolated pages. The motivation here is therefore to exploit the entire available information (during the recognition process), which is not effectively used earlier for enhancing the performance of the recognizer. To this end, we propose %an architecture and learning algorithms of a self adaptable OCR framework for the recognition of document image collections. This approach enables the recognizer to learn incrementally and adapt to document image collections for performance improvement. We employ learning procedures to capture the relevant information available online, and feed it back to update the knowledge of the system. Experimental results show the effectiveness of our design for improving the performance of the recognizer on-the-fly, thereby adapting to a specific collection.

For indigenous scripts of African and Indian languages there is no robust OCR available. Designing such a system is also a long-term process for accessing the archived document images. Hence we explore the application of word spotting approach for retrieval of printed document images without explicit recognition. To this end, we propose an effective word image matching scheme that achieves high performance in presence of script variability, printing variations, degradations and word-form variations. A novel partial matching algorithm is designed for morphological matching of word form variants in a language. We employ a feature extraction scheme that extracts local features by scanning vertical strips of the word image. These features are then combined based on their discriminatory potential. We present detailed experimental results of the proposed approach on English, Amharic and Hindi documents.

Searching and retrieval from document image collections is challenging because of the scalability issues and computational time. We design an efficient indexing scheme to speed up searching for relevant document images. We identify the word set by clustering them into different groups based on their similarities. Each of these clusters are equivalent to a variation in printing, morphology, and quality. This is achieved by mapping IR principles (that are widely used in text processing) for relevance ranking. Once we cluster list of index terms that define the content of the document, they are indexed using inverted data structure. This data structure also provides scope for incremental clustering in a dynamic environment. The indexing scheme enables effective search and retrieval in image-domain that is comparable with text search engines. We demonstrate the application of the indexing process with the help of experimental results.

The proliferation of document images at large scale demands a solution that is effective to access document images at collection level. In this work, we investigate machine learning and information retrieval approaches that suits this demand. Machine learning schemes are effectively used to redesign existing approach to OCR development. The recognizer is enabled to learn from its experience and improve its performance over time on document image collections. Information retrieval (IR) principles are mapped to construct an indexing scheme for efficient content-based search and retrieval from document image collections. Existing matching scheme is redesigned to undertake morphological matching in the image domain. Performance evaluation using datasets from different languages shows the effectiveness of our approaches. Extension works are recommended that need further consideration in the future to further the state-of-the-art in document image recognition and document image retrieval.

 

Year of completion:  2008
 Advisor : Prof. C.V. Jawahar

Related Publications

  • C. V. Jawahar, A. Balasubramanian, Million Meshesha and Anoop M. Namboodiri - Retrieval of Online Handwriting by Synthesis and Matching Proceeding of the International Journal on Pattern Recognition, 42(7), 1445-1457, 2009. [PDF]

  • Million Meshesha and C. V. Jawahar - Matching word image for content-based retrieval from printed document images Proceeding of the International Journal on Document Analysis and Recognition, IJDAR 11(1), 29-38, 2008. [PDF]

  • Million Meshesha and C.V. Jawahar - Self Adaptable Recognizer for Document Image Collections Proceeding of the Second International Conference on Pattern Recognition Machine Learning (PReMI'2007), pp. 560-567, 18-22 Dec, 2007, Kolkata. [PDF]

  • Million Meshesha and C. V. Jawahar - Indigenous Scripts of African Languaes Proceeding of the African Journal of Indigenous Knowledge Systems, Vol. 6, Issue 2, pp. 132-142, 2007. [PDF]

  • Million Meshesha and C. V. Jawahar - Optical Character Recognition of Amharic Documents Proceeding of the Africal Journal of Information and Communcation Technology ISSN 1449-2679, Volume 3, Number 3, December 2007. [PDF]

  • Pramod Sankar K., Million Meshesha and C. V. Jawahar - Annotation of Images and videos based on Textual Content without OCR, Workshop on Computation Intensive Methods for Computer Vision(in conjuction with ECCV 2006), 2006. [PDF]

  • A. Balasubramanian, Million Meshesha and C. V. Jawahar - Retrieval from Document Image Collections, Proceedings of Seventh IAPR Workshop on Document Analysis Systems, 2006 (LNCS 3872), pp 1-12. [PDF]

  • Sachin Rawat, K. S. Sesh Kumar, Million Meshesha, Indineel Deb Sikdar, A. Balasubramanian and C. V. Jawahar - A Semi-Automatic Adaptive OCR for Digital Libraries, Proceedings of Seventh IAPR Workshop on Document Analysis Systems, 2006 (LNCS 3872), pp 13-24. [PDF]

  • Million Meshesha and C. V. Jawahar  - Recognition of Printed Amharic Documents, Proceedings of Eighth International Conference on Document Analysis and Recognition(ICDAR), Seoul, Korea 2005, Vol 1, pp 784-788. [PDF]

  • C. V. Jawahar, Million Meshesha and A. Balasubramanian, Searching in Document Images, Proceedings of the Indian Conference on Vision, Graphics and Image Processing(ICVGIP), Dec. 2004, Calcutta, India, pp. 622--627. [PDF]

 


Downloads

  thesis

ppt

Computational Displays - On Enhancing Displays using Computation


Pawan Harish (homepage)

Displays have seen many improvements over the years. With advancements in pixel resolution, color gamut, vertical refresh rates and power consumption, displays today have become more personal and accessible devices for sharing and feeding information. Advancements in computer human interaction have provided novel interactions with current displays. Touch panels are now common and provide better interaction with the cyberworld shown on a device. Displays today have many shortcomings still. These include a rectangular shape, low color gamut, low dynamic range, lack of simultaneous focus and context in a scene, lack of 3D viewing, etc. Efforts are being made to create better and more natural displays than 2D flat rectangular screens we see today. Much research has gone into designing better displays including 3D displays, focus and context displays, HDR displays, etc. Such displays enhance a display directly using device level technologies such as physical, chemical and metallurgical means. This approach has been taken in the past but proves expensive and is hard to scale to better standards that may be needed in color, refresh rate, spatial and intensity resolution in the coming future. New paradigms must be explored to generate better displays that may provide better user experiences and are easy to view and interact with. The conventional direct approach requires new physical properties to be constantly discovered in order to push the envelop in display design. Such technologies are hard to come by and may take years to prove their merit, for them to be acceptable in the display design pipeline. An alternate is to use computation to enhance displays. Today computation is easily available and cheap. CPUs follow the Moore’s law for their ever expanding compute capabilities. Multi-core CPU architectures are now common - even in hand held devices. GPUs consisting of 1500 cores deliver 1:5Tera FLOPS for $500 today. This abundance of compute capability has been applied to various systems with much success. Many areas including computational photography, computational biology and human computer interactions take advantage of computation to enhance a base system. Displays too have been shown to benefit from such an approach. Fish tank virtual reality is an example of this, 3D viewing can be enabled for a head tracked viewer using standard displays in conjunction with computer graphics. Computation can thus be used to enhance displays or remove their shortcomings. Many displays are underway that combine computation with other means to provide better and natural user experiences. These may combine computation with optics, display arrangement, metallurgy, sensors, etc. to bring out more than what is available on existing displays. Such displays can collectively fall under the term Computational Displays. (more...)

 

Year of completion:  June 2013
 Advisor :

P. J. Narayanan


Related Publications

  • Pawan Harish, Parikshit Sakurikar, P J Narayanan - Increasing Instensity Resolution on a Single Display using Spatio-Temporal Mixing Proceedings of the 8th Indian Conference on Vision, Graphics and Image Processing, 16-19 Dec. 2012, Bombay, India. [PDF]

  • Pawan Harish and P. J. Narayanan - Designing Perspectively-Correct Multiplanar Display IEEE Transactions on Visualization and Computer Graphics, Vol.99, 2012. [PDF]

  • Pawan Harish and P.J. Narayanan - View Dependent Rendering to Simple Parametric Display Surfaces Proceedings of 17th Eurographics Symposium on Virtual Environments (EGVG 2011),20-21 Sep. 2011, ISBN 978-3-905674-33-0, pp. 27-30, Nottingham, UK. [PDF]

  • Nirnimesh, Pawan Harish and P. J. Narayanan - Garuda: A Scalable, Tiled Display Wall Using Commodity PCs Proc. of IEEE Transactions on Visualization and Computer Graphics(TVCG), Vol.13, no.~5, pp.864-877, 2007. [PDF]

  • Nirnimesh, Pawan Harish and P. J. Narayanan - Culling an Object Hierarchy to a Frustum Hierarchy, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCL 4338 pp.252-263,2006. [PDF]

  • Pawan Harish, Parikshit Sakurikar & P. J. Narayanan - Spatio-Temporal Mixing to Increase Intensity Resolution on a Single Display CVPR Workshop on Computational Cameras and Displays, 2012. (Poster)
  • Pawan Harish & P. J. Narayanan - A View Dependent, Polyhedral 3D Display ACM SIGGRAPH Virtual Reality Continuum and its Applications in Industry, 2009. [PDF]

Downloads

thesis

 ppt

Automatic Retinal Image Analysis for the detection of Glaucoma


Gopal Datt Joshi

Glaucoma is recognized to be the second most common cause of blindness. Early detection and treatment of glaucoma is hence important for the prevention of disease. Manual screening for glaucoma at a larger scale is challenging as availability of skilled manpower in Ophthalmology is low. The research focus of this thesis is to investigate the potential of retinal image analysis in glaucoma assessment towards developing automatic glaucoma screening system.

The task of glaucoma detection in a retinal image involves assessment of multiple retinal changes (indicators) associated directly or indirectly with the loss of underlying nerve fibers. Foremost, the task of analyzing both intra- and peri-papillary indicators from the colour retinal image is of prime importance to gather complete information. The intra-papillary indicators include retinal changes occurring within the optic disk (OD) region whereas peri-papillary indicators account for the changes occurring in the periphery of the OD region. The OD consists of a small crater-like depression in its center, physiologically known as cup region. The annular region formed between cup and OD boundaries is called \textit{rim} which represents the amount of nerve fiber bundles entering to the OD from the different areas of retina. The enlargement in the cup region called cupping is the most important intra-papillary indicator to assess the glaucomatous damage.

In this thesis, a set of solutions are developed for detecting intra-papillary indicators by segmenting the OD and cup regions from colour retinal images. A region-based active contour model is proposed to segment the OD. This extends the standard region-based active contour model by including \textit{localized} image information as a support domain around each point of interest on the contour for accurate segmentation. This model is further strengthened by the integration of information from the multiple image feature channels. The experimental results conducted on a fairly large number of images show that the proposed method is more robust and accurate than two other existing methods overall, and particularly in dealing with challenges posed by peri-peripheral atrophy regions and irregular shape of OD.

For cup segmentation methods are designed for a single monocular retinal image and sequentially acquired pair of monocular images. While the former uses vessel bend information to estimate the cup boundary the latter uses multi-view geometry and a novel problem formulation. In this formulation the objective of the underlying problem is redefined to detecting the depth discontinuity in retinal surface unlike computing precise depth/disparity, a crucial step used in earlier approaches. In other words, cup boundary points are now defined by depth discontinuities (edges) introduced by the cup region in the retinal surface. The proposed solution based on this formulation gives several advantages and flexibility over earlier solutions. Our experiment results show applicability of the proposed formulation to simultaneously as well as sequentially acquired stereo image pairs which were earlier restricted to simultaneously acquired stereo image pair.

The presence of peri-papillary indicators such as peri-papillary atrophy (PPA) and retinal nerve fiber (RNFL) defect are considered useful to gain confidence in the findings of intra-papillary indicators. However, the detection of peri-papillary indicators is generally considered challenging mainly due to the high amount of inter-image variations and also due to uncertainty in their occurrences. A detection strategy is proposed which is motivated from the saliency exhibited by these indicators in the presence of their local surround. This saliency aspect is represented by a set of context-based image features.

The information obtained from the analysis of intra- and peri-papillary indicators are finally integrated to arrive at a decision on the presence of glaucoma. The proposed integration is at the level of image features. Moreover, the regional level distributions of local image structures are computed to achieve robustness against inter-image variations. The final classification step uses aforementioned information to classify glaucoma in a retinal image. An extensive evaluation is carried out on a large test set of 1154 retinal images. The proposed system and existing CAD-based solutions are compared on various performance metrics. The designed system for glaucoma detection is assessed against three glaucoma fellows and two general ophthalmologists to understand the potential of the presented system as a screening system.

The assessment results are encouraging as the proposed system performs as well as general ophthalmologists. Our experiment results indicate that intra-papillary indicators play a crucial role in the detection of glaucoma as compared to peri-papillary indicators. However, the combination of these indicators leads to a better detection performance. It is also noticed that the clinically identified intra-papillary measurement are not sufficient to capture a wide range of glaucomatous disk changes.

 

Year of completion:  July 2014
 Advisor : Prof. Jayanthi Sivaswamy

 


Related Publications

  • Jayanthi Sivaswamy, S R Krishnadas, Gopal Dutt Joshi, Madhulika Jain, Ujjwal, Syed Tabish A. - Drishit-GS: Retinal Image Dataset for Optic Nerve Head(ONH) Segmentation Proceedings of the IEEE International Symposium on Biomedical Imaging, 29 April-2 May 2014, Beijing, China. [PDF]

  • Gopal Datt Joshi, Jayanthi Sivaswamy, Prashanth R and S R Krishnadas - Detection of Peri-papillary Atrophy and RNFL Defect from Retinal Images Proceedings of International Conference on Image Analysis and Recognition , Aveiro 2012. [PDF]

  • Gopal Datt Joshi,, Jayanthi Sivaswamy and S. R. Krishnadas - Depth discontinuity-based cup segmentation from multi-vieew colour retinal images IEEE Transactions on Biomedical Engineering, 59(6), pp. 1523-1531,2012. [PDF]

  • Gopal Datt Joshi, Jayanthi Sivaswamy and S.R. Krishnadas - Optic Disk and Cup Segmentation from Monocular Colour Retinal Images for Glaucoma Assessment IEEE Transactions on Medical Imaging, 30(1), pp. 1192-1205, June,2011. [PDF]

  • Gopal Datt Joshi, Rohit Gautam, Jayanthi Sivaswamy and S. R. Krishnadas - Robust Optic Disk Segmentation from Colour Retinal Images Proceedings of Seventh Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP'10),12-15 Dec. 2010,Chennai, India. [PDF]

  • Gopal Datt Joshi, Jayanthi Sivaswamy, Kundan Karan, Prashanth R and S.R. Krishnadas - Vessel bend-based cup segmentation in retinal images Proceedings of 20th International Conference on Pattern Recognition (ICPR'10),23-26 Aug. 2010, Istanbul, Turkey. [PDF]

  • Gopal Datt Joshi, Jayanthi Sivaswamy, Kundan Karan and S.R. Krishnadas - Optic Disk and Cup Cup Boundary Detection Using Regional Information Proceedings of IEEE Internation Symposium on Biomedical Imaging: From Nano to Macro(ISBI'10), pp.948-951, 14-17 April, 2010, Rottendam, Netherlands. [PDF]

Clinical Abstracts

  • L. Chakrabarty, R. Krishnadas, G. D. Joshi, J. Sivaswamy - Automated Fundus Image Assessment:A Novel Screening Method for Glaucoma, 28th Congress of the Asia-Pacific Academy of Ophthalmology (APAO-AIOS), Hyderabad, 2013 & ASIA-ARVO, New Delhi, 2013.

Downloads

  thesis

ppt

  • Start
  • Prev
  • 1
  • 2
  • 3
  • 4
  • 5
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. Thesis
  5. Doctoral Dissertations
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.