Understanding Text in Scene Images

Anand Mishra (homepage)


With the rapid growth of camera-based mobile devices, applications that answer questions such as, “What does this sign say?” are becoming increasingly popular. This is related to the problem of optical character recognition (OCR) where the task is to recognize text occurring in images. The OCR problem has a long history in the computer vision community. However, the success of OCR systems is largely restricted to text from scanned documents. Scene text, such as text occurring in images captured with a mobile device, exhibits a large variability in appearance. Recognizing scene text has been challenging, even for the state-of-the-art OCR methods. Many scene understanding methods recognize objects and regions like roads, trees, sky in the image successfully, but tend to ignore the text on the sign board. Towards filling this gap, we devise robust techniques for scene text recognition and retrieval in this thesis.

This thesis presents three approaches to address scene text recognition problems. First, we propose a robust text segmentation (binarization) technique, and use it to improve the recognition performance. We pose the binarization problem as a pixel labeling problem and define a corresponding novel energy function which is minimized to obtain a binary segmentation image. This method makes it possible to use standard OCR systems for recognizing scene text. Second, we present an energy minimization framework that exploits both bottom-up and top-down cues for recognizing words extracted from street images. The bottom-up cues are derived from detections of individual text characters in an image. We build a conditional random field model on these detections to jointly model the strength of the detections and the interactions between them. These interactions are top-down cues obtained from a lexicon-based prior, i.e., language statistics. The optimal word represented by the text image is obtained by minimizing the energy function corresponding to the random field model. The proposed method significantly improves the scene text recognition performance. Thirdly, we present a holistic word recognition framework, which leverages scene text image and synthetic images generated from lexicon words. We then recognize the text in an image by matching the scene and synthetic image features with our novel weighted dynamic time warping approach. This approach does not require any language statistics or language specific character-level annotations.

Finally, we address the problem of image retrieval using textual cues, and demonstrate large-scale text-to-image retrieval. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that this approach, despite being based on state-of-the art methods, is insufficient, and propose an approach without relaying on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database.

We evaluate our proposed methods extensively on a number of scene text benchmark datasets, namely, street view text, ICDAR 2003, 2011 and 2013, and a new dataset IIIT 5K-word, we introduced, and show better performance than all the comparable methods. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely, IIIT scene text retrieval, Sports-10K and TV series-1M, we introduced.


Year of completion:  December 2016
 Advisor : Prof. C.V. Jawahar & Dr. Karteek Alahari

Related Publications

  • Anand Mishra, Karteek Alahari and C. V. Jawahar - Enhancing energy minimization framework for scene text recognition with top-down cues - Computer Vision and Image Understanding (CVIU 2016), volume 145, pages 30–42, 2016. [PDF]

  • Anand Mishra, Karteek Alahari and C V Jawahar - Image Retrieval using Textual Cues Proceedings of International Conference on Computer Vision, 1-8th Dec.2013, Sydney, Australia. [Pdf] [Abstract] [Project page][bibtex]

  • Vibhor Goel, Anand Mishra, Karteek Alahari, C V Jawahar - Whole is Greater than Sum of Parts: Recognizing Scene Text Words Proceedings of the 12th International Conference on Document Analysis and Recognition, 25-28 Aug. 2013, Washington DC, USA. [PDF] [Abstract] [bibtex]

  • Anand Mishra, Karteek Alahari and C V Jawahar - Scene Text Recognition using Higher Order Language Priors Proceedings of British Machine Vision Conference, 3-7 Sep. 2012, Guildford, UK. [PDF] [Abstract] [Slides] [bibtex]

  • Anand Mishra, Karteek Alahari and C V Jawahar - Top-down and Bottom-up Cues for Scene Text Recognition Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 16-21 June 2012, pp. 2287-2294, Providence RI, USA. [PDF] [Abstract] [Poster] [bibtex]

  • Anand Mishra, Karteek Alahari and C.V. Jawahar - An MRF Model for Binarization of Natural Scene Text Proceedings of 11th International Conference on Document Analysis and Recognition (ICDAR 2011),18-21 September, 2011, Beijing, China. [PDF] [Abstract] [Slides] [bibtex]




Aligning Textual & Visual Data : Towards Scalable Multimedia Retrieval

Pramod Sankar Kompalli (homepage)

The search and retrieval of relevant images and videos from large repositories of multimedia, is acknowledged as one of the hard challenges of computer science. With existing pattern recognition solutions, one cannot obtain detailed, semantic description for a given multimedia document. Several limitations exist in feature extraction, classification schemes, along with the incompatibility of representations across domains. The situation will most likely remain so, for several years to come.

Towards addressing this challenge, we observe that several multimedia collections contain similar parallel information that are: i) semantic in nature, ii) weakly aligned with the multimedia and iii) available freely. For example, the content of a news broadcast is also available in the form of newspaper articles. If a correspondence could be obtained between the videos and such parallel information, one could access one medium using the other, which opens up immense possibilities for information extraction and retrieval. However, it is challenging to find the mapping between the two sources of data due to the unknown semantic hierarchy within each medium and the difficulty to match information across the different modalities. In this thesis, we propose novel algorithms that address these challenges.

Different &lt Multimedia, Parallel Information &gt pairs, require different alignment techniques, depending on the granularity at which entities could be matched across them. We choose four pairs of multimedia, along with parallel information obtained in the <i>text</i> domain, such that the data is both challenging and available on a large scale. Specifically, our multimedia consists of movies, broadcast sports videos and document images, with the parallel text coming from scripts, commentaries and language resources. As we proceed from one pair to the next, we discover an increasing complexity of the problem, due to a relaxation of the temporal binding between the parallel information and the multimedia. By addressing this challenge, we build solutions that perform increasingly fine-grained alignment between multimedia and text data.

The framework that we propose begins with an assumption that we could segment the multimedia and the text into meaningful entities that could correspond to each other. The problem then, is to identify <i>features</i> and learn to match a text-entity to a multimedia-segment (and vice versa). Such a matching scheme could be refined using additional constraints, such as temporal ordering and occurrence statistics. We build algorithms that could align across i) movies and scripts, where sentences from the script are aligned to their respective <i>video-shots</i> and ii) document images with lexicon, where the words of the dictionary are mapped to clusters of word-images extracted from the scanned books.

Further, we relax the constraint in the above assumption, such that the segmentation of the multimedia is not available <i>apriori<\i>. The problem now, is to perform a joint inference of segmentation and annotation. We address this problem by building an over-complete representation of the multimedia. A large number of putative segmentations are matched against the information extracted from the parallel text, with the joint inference achieved through dynamic programming. This approach was successfully demonstrated on i) Cricket videos, which were segmented and annotated with information from online commentaries and ii) word-images, where sub-words called Character N-Grams, are accurately segmented and labelled using the text-equivalent of the word.

As a consequence of the approaches proposed in this thesis, we were able to demonstrate text-based retrieval systems over large multimedia collections. The semantic level at which we can retrieve information was made possible by the annotation with parallel text information. Our work also results in a large set of labeled multimedia, which could be used by sophisticated machine learning algorithms for learning new concepts. (more...)


Year of completion:  May 2015
 Advisor :

Prof. C. V. Jawahar

Related Publications

  • Pramod Sankar K, R Manmatha and C V Jawahar - Large Scale Document Image Retrieval by Automatic Word Annotation International Journal on Document Analysis and Recognition (IJDAR):Volume 17, Issue 1(2014), Page 1-17. [PDF]

  • Udit Roy, Naveen Sankaran, Pramod Sankar K. and C V JawaharCharacter N-Gram Spotting on Handwritten Documents using Weakly-Supervised Segmentation Proceedings of the 12th International Conference on Document Analysis and Recognition, 25-28 Aug. 2013, Washington DC, USA. [PDF]

  • Shrey Dutta, Naveen Sankaran, Pramod Sankar K and C V JawaharRobust Recognition of Degraded Documents Using Character N-Grams Proceedings of 10th IAPR International Workshop on Document Analysis Systems 27-29 Mar. 2012, ISBN 978-1-4673-0868-7, pp. 130-134, Queensland, Australia. [PDF]

  • Sudha Praveen M., Pramod Sankar K., and C.V. Jawahar - Character n-Gram Spotting in Document Images Proceedings of 11th International Conference on Document Analysis and Recognition (ICDAR 2011),18-21 September, 2011, Beijing, China. [PDF]

  • Pramod Kompalli, C.V.Jawahar and R. Manmatha - Nearest Neighbor based Collection OCR Proceedings of Ninth IAPR International Workshop on Document Analysis Systems (DAS'10), pp. 207-214, 9-11 June, 2010, Boston, MA, USA. [PDF]

  • Pramod Sankar K, C. V. Jawahar and Andrew Zisserman - Subtitle-free Movie to Script Alignment Proceedings of British Machine Vision Conference (BMVC 09), 7-10 September, 2009, London, UK. [PDF]

  • Pramod Sankar K and C.V. Jawahar - Probabilistic Reverse Annotation for Large Scale Image Retrieval Proc of IEEE Computer Society Conference on Computer Vision and Pattern Recognition,Minneapolis, Minnesona, 18-23 June, 2007. [PDF]

  • Pramod Sankar K. and C.V. Jawahar - Enabling Search over Large Collections of Telugu Document Images-An Automatic Annotation Based Approach , 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.837-848, 2006. [PDF]

  • Pramod Sankar K., Saurabh Pandey and C. V. Jawahar - Text Driven Temporal Segmentation of Cricket Videos , 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.433-444, 2006. [PDF]

  • Pramod Sankar K, Vamshi Ambati, Lakshmi Hari and C. V. Jawahar, - Digitizing A Million Books: Challenges for Document Analysis, Proceedings of Seventh IAPR Workshop on Document Analysis Systems, 2006 (LNCS 3872), pp 425-436. [PDF]






Computational Displays - On Enhancing Displays using Computation

Pawan Harish (homepage)

Displays have seen many improvements over the years. With advancements in pixel resolution, color gamut, vertical refresh rates and power consumption, displays today have become more personal and accessible devices for sharing and feeding information. Advancements in computer human interaction have provided novel interactions with current displays. Touch panels are now common and provide better interaction with the cyberworld shown on a device. Displays today have many shortcomings still. These include a rectangular shape, low color gamut, low dynamic range, lack of simultaneous focus and context in a scene, lack of 3D viewing, etc. Efforts are being made to create better and more natural displays than 2D flat rectangular screens we see today. Much research has gone into designing better displays including 3D displays, focus and context displays, HDR displays, etc. Such displays enhance a display directly using device level technologies such as physical, chemical and metallurgical means. This approach has been taken in the past but proves expensive and is hard to scale to better standards that may be needed in color, refresh rate, spatial and intensity resolution in the coming future. New paradigms must be explored to generate better displays that may provide better user experiences and are easy to view and interact with. The conventional direct approach requires new physical properties to be constantly discovered in order to push the envelop in display design. Such technologies are hard to come by and may take years to prove their merit, for them to be acceptable in the display design pipeline. An alternate is to use computation to enhance displays. Today computation is easily available and cheap. CPUs follow the Moore’s law for their ever expanding compute capabilities. Multi-core CPU architectures are now common - even in hand held devices. GPUs consisting of 1500 cores deliver 1:5Tera FLOPS for $500 today. This abundance of compute capability has been applied to various systems with much success. Many areas including computational photography, computational biology and human computer interactions take advantage of computation to enhance a base system. Displays too have been shown to benefit from such an approach. Fish tank virtual reality is an example of this, 3D viewing can be enabled for a head tracked viewer using standard displays in conjunction with computer graphics. Computation can thus be used to enhance displays or remove their shortcomings. Many displays are underway that combine computation with other means to provide better and natural user experiences. These may combine computation with optics, display arrangement, metallurgy, sensors, etc. to bring out more than what is available on existing displays. Such displays can collectively fall under the term Computational Displays. (more...)


Year of completion:  June 2013
 Advisor :

P. J. Narayanan

Related Publications

  • Pawan Harish, Parikshit Sakurikar, P J Narayanan - Increasing Instensity Resolution on a Single Display using Spatio-Temporal Mixing Proceedings of the 8th Indian Conference on Vision, Graphics and Image Processing, 16-19 Dec. 2012, Bombay, India. [PDF]

  • Pawan Harish and P. J. NarayananDesigning Perspectively-Correct Multiplanar Display IEEE Transactions on Visualization and Computer Graphics, Vol.99, 2012. [PDF]

  • Pawan Harish and P.J. Narayanan - View Dependent Rendering to Simple Parametric Display Surfaces Proceedings of 17th Eurographics Symposium on Virtual Environments (EGVG 2011),20-21 Sep. 2011, ISBN 978-3-905674-33-0, pp. 27-30, Nottingham, UK. [PDF]

  • Nirnimesh, Pawan Harish and P. J. Narayanan - Garuda: A Scalable, Tiled Display Wall Using Commodity PCs Proc. of IEEE Transactions on Visualization and Computer Graphics(TVCG), Vol.13, no.~5, pp.864-877, 2007. [PDF]

  • Nirnimesh, Pawan Harish and P. J. Narayanan - Culling an Object Hierarchy to a Frustum Hierarchy, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCL 4338 pp.252-263,2006. [PDF]

  • Pawan Harish, Parikshit Sakurikar & P. J. Narayanan - Spatio-Temporal Mixing to Increase Intensity Resolution on a Single Display CVPR Workshop on Computational Cameras and Displays, 2012. (Poster)
  • Pawan Harish & P. J. Narayanan - A View Dependent, Polyhedral 3D Display ACM SIGGRAPH Virtual Reality Continuum and its Applications in Industry, 2009. [PDF]




Recognition and Retrieval from Document Image Collections

Million Meshesha

The present growth of digitization of books and manuscripts demands an immediate solution to access them electronically. This will enable the archived valuable materials to be searchable and usable by users in order to achieve their objectives. This requires research in the area of document image understanding, specifically in the area of document image recognition as well as document image retrieval. In the last three decades significant advancement is made in the recognition of documents written in Latin-based and some Oriental scripts. There are many excellent attempts in building robust document analysis systems in industry, academia and research labs. Intelligent recognition systems are commercially available for use for certain scripts. However, there is only limited research effort for the recognition of indigenous scripts of African and Indian languages. In addition, diversity of archived printed documents poses many challenges to document analysis and understanding. Hence in this work, we explore novel approaches for understanding and accessing the content of document image collections that vary in quality and printing.

In Africa around 2,500 languages are spoken. Some of these languages have their own indigenous scripts in which there is a bulk of printed documents available in the various institutions. Digitization of these documents enables us to harness already available language technologies to local information needs and developments. We present an OCR for converting digitized documents in Amharic language.Amharic is the official language of Ethiopia. Extensive literature survey reveals that this is the first attempt that reports the challenges toward the recognition of indigenous African scripts and a possible solution for Amharic script. Research in the recognition of Amharic script faces major challenges due to (i) the use of large number of characters in the writing and (ii) the existence of large set of visually similar characters. Here we extract a set of optimal discriminant features to train the classifier. Recognition results are presented on real-life degraded documents such as books, magazines and newspapers to demonstrate the performance of the recognizer.

The present OCRs are typically designed to work on a single page at a time. We argue that the recognition scheme for a collection (like a book) could be considerably different from that designed for isolated pages. The motivation here is therefore to exploit the entire available information (during the recognition process), which is not effectively used earlier for enhancing the performance of the recognizer. To this end, we propose %an architecture and learning algorithms of a self adaptable OCR framework for the recognition of document image collections. This approach enables the recognizer to learn incrementally and adapt to document image collections for performance improvement. We employ learning procedures to capture the relevant information available online, and feed it back to update the knowledge of the system. Experimental results show the effectiveness of our design for improving the performance of the recognizer on-the-fly, thereby adapting to a specific collection.

For indigenous scripts of African and Indian languages there is no robust OCR available. Designing such a system is also a long-term process for accessing the archived document images. Hence we explore the application of word spotting approach for retrieval of printed document images without explicit recognition. To this end, we propose an effective word image matching scheme that achieves high performance in presence of script variability, printing variations, degradations and word-form variations. A novel partial matching algorithm is designed for morphological matching of word form variants in a language. We employ a feature extraction scheme that extracts local features by scanning vertical strips of the word image. These features are then combined based on their discriminatory potential. We present detailed experimental results of the proposed approach on English, Amharic and Hindi documents.

Searching and retrieval from document image collections is challenging because of the scalability issues and computational time. We design an efficient indexing scheme to speed up searching for relevant document images. We identify the word set by clustering them into different groups based on their similarities. Each of these clusters are equivalent to a variation in printing, morphology, and quality. This is achieved by mapping IR principles (that are widely used in text processing) for relevance ranking. Once we cluster list of index terms that define the content of the document, they are indexed using inverted data structure. This data structure also provides scope for incremental clustering in a dynamic environment. The indexing scheme enables effective search and retrieval in image-domain that is comparable with text search engines. We demonstrate the application of the indexing process with the help of experimental results.

The proliferation of document images at large scale demands a solution that is effective to access document images at collection level. In this work, we investigate machine learning and information retrieval approaches that suits this demand. Machine learning schemes are effectively used to redesign existing approach to OCR development. The recognizer is enabled to learn from its experience and improve its performance over time on document image collections. Information retrieval (IR) principles are mapped to construct an indexing scheme for efficient content-based search and retrieval from document image collections. Existing matching scheme is redesigned to undertake morphological matching in the image domain. Performance evaluation using datasets from different languages shows the effectiveness of our approaches. Extension works are recommended that need further consideration in the future to further the state-of-the-art in document image recognition and document image retrieval.


Year of completion:  2008
 Advisor : Prof. C.V. Jawahar

Related Publications

  • C. V. Jawahar, A. Balasubramanian, Million Meshesha and Anoop M. Namboodiri - Retrieval of Online Handwriting by Synthesis and Matching Proceeding of the International Journal on Pattern Recognition, 42(7), 1445-1457, 2009. [PDF]

  • Million Meshesha and C. V. Jawahar - Matching word image for content-based retrieval from printed document images Proceeding of the International Journal on Document Analysis and Recognition, IJDAR 11(1), 29-38, 2008. [PDF]

  • Million Meshesha and C.V. Jawahar - Self Adaptable Recognizer for Document Image Collections Proceeding of the Second International Conference on Pattern Recognition Machine Learning (PReMI'2007), pp. 560-567, 18-22 Dec, 2007, Kolkata. [PDF]

  • Million Meshesha and C. V. Jawahar - Indigenous Scripts of African Languaes Proceeding of the African Journal of Indigenous Knowledge Systems, Vol. 6, Issue 2, pp. 132-142, 2007. [PDF]

  • Million Meshesha and C. V. Jawahar - Optical Character Recognition of Amharic Documents Proceeding of the Africal Journal of Information and Communcation Technology ISSN 1449-2679, Volume 3, Number 3, December 2007. [PDF]

  • Pramod Sankar K., Million Meshesha and C. V. Jawahar - Annotation of Images and videos based on Textual Content without OCR, Workshop on Computation Intensive Methods for Computer Vision(in conjuction with ECCV 2006), 2006. [PDF]

  • A. Balasubramanian, Million Meshesha and C. V. Jawahar - Retrieval from Document Image Collections, Proceedings of Seventh IAPR Workshop on Document Analysis Systems, 2006 (LNCS 3872), pp 1-12. [PDF]

  • Sachin Rawat, K. S. Sesh Kumar, Million Meshesha, Indineel Deb Sikdar, A. Balasubramanian and C. V. Jawahar - A Semi-Automatic Adaptive OCR for Digital Libraries, Proceedings of Seventh IAPR Workshop on Document Analysis Systems, 2006 (LNCS 3872), pp 13-24. [PDF]

  • Million Meshesha and C. V. Jawahar  - Recognition of Printed Amharic Documents, Proceedings of Eighth International Conference on Document Analysis and Recognition(ICDAR), Seoul, Korea 2005, Vol 1, pp 784-788. [PDF]

  • C. V. Jawahar, Million Meshesha and A. Balasubramanian, Searching in Document Images, Proceedings of the Indian Conference on Vision, Graphics and Image Processing(ICVGIP), Dec. 2004, Calcutta, India, pp. 622--627. [PDF]





Automatic Retinal Image Analysis for the detection of Glaucoma

Gopal Datt Joshi

Glaucoma is recognized to be the second most common cause of blindness. Early detection and treatment of glaucoma is hence important for the prevention of disease. Manual screening for glaucoma at a larger scale is challenging as availability of skilled manpower in Ophthalmology is low. The research focus of this thesis is to investigate the potential of retinal image analysis in glaucoma assessment towards developing automatic glaucoma screening system.

The task of glaucoma detection in a retinal image involves assessment of multiple retinal changes (indicators) associated directly or indirectly with the loss of underlying nerve fibers. Foremost, the task of analyzing both intra- and peri-papillary indicators from the colour retinal image is of prime importance to gather complete information. The intra-papillary indicators include retinal changes occurring within the optic disk (OD) region whereas peri-papillary indicators account for the changes occurring in the periphery of the OD region. The OD consists of a small crater-like depression in its center, physiologically known as cup region. The annular region formed between cup and OD boundaries is called \textit{rim} which represents the amount of nerve fiber bundles entering to the OD from the different areas of retina. The enlargement in the cup region called cupping is the most important intra-papillary indicator to assess the glaucomatous damage.

In this thesis, a set of solutions are developed for detecting intra-papillary indicators by segmenting the OD and cup regions from colour retinal images. A region-based active contour model is proposed to segment the OD. This extends the standard region-based active contour model by including \textit{localized} image information as a support domain around each point of interest on the contour for accurate segmentation. This model is further strengthened by the integration of information from the multiple image feature channels. The experimental results conducted on a fairly large number of images show that the proposed method is more robust and accurate than two other existing methods overall, and particularly in dealing with challenges posed by peri-peripheral atrophy regions and irregular shape of OD.

For cup segmentation methods are designed for a single monocular retinal image and sequentially acquired pair of monocular images. While the former uses vessel bend information to estimate the cup boundary the latter uses multi-view geometry and a novel problem formulation. In this formulation the objective of the underlying problem is redefined to detecting the depth discontinuity in retinal surface unlike computing precise depth/disparity, a crucial step used in earlier approaches. In other words, cup boundary points are now defined by depth discontinuities (edges) introduced by the cup region in the retinal surface. The proposed solution based on this formulation gives several advantages and flexibility over earlier solutions. Our experiment results show applicability of the proposed formulation to simultaneously as well as sequentially acquired stereo image pairs which were earlier restricted to simultaneously acquired stereo image pair.

The presence of peri-papillary indicators such as peri-papillary atrophy (PPA) and retinal nerve fiber (RNFL) defect are considered useful to gain confidence in the findings of intra-papillary indicators. However, the detection of peri-papillary indicators is generally considered challenging mainly due to the high amount of inter-image variations and also due to uncertainty in their occurrences. A detection strategy is proposed which is motivated from the saliency exhibited by these indicators in the presence of their local surround. This saliency aspect is represented by a set of context-based image features.

The information obtained from the analysis of intra- and peri-papillary indicators are finally integrated to arrive at a decision on the presence of glaucoma. The proposed integration is at the level of image features. Moreover, the regional level distributions of local image structures are computed to achieve robustness against inter-image variations. The final classification step uses aforementioned information to classify glaucoma in a retinal image. An extensive evaluation is carried out on a large test set of 1154 retinal images. The proposed system and existing CAD-based solutions are compared on various performance metrics. The designed system for glaucoma detection is assessed against three glaucoma fellows and two general ophthalmologists to understand the potential of the presented system as a screening system.

The assessment results are encouraging as the proposed system performs as well as general ophthalmologists. Our experiment results indicate that intra-papillary indicators play a crucial role in the detection of glaucoma as compared to peri-papillary indicators. However, the combination of these indicators leads to a better detection performance. It is also noticed that the clinically identified intra-papillary measurement are not sufficient to capture a wide range of glaucomatous disk changes.


Year of completion:  July 2014
 Advisor : Prof. Jayanthi Sivaswamy


Related Publications

  • Jayanthi Sivaswamy, S R Krishnadas, Gopal Dutt Joshi, Madhulika Jain, Ujjwal, Syed Tabish A. - Drishit-GS: Retinal Image Dataset for Optic Nerve Head(ONH) Segmentation Proceedings of the IEEE International Symposium on Biomedical Imaging, 29 April-2 May 2014, Beijing, China. [PDF]

  • Gopal Datt Joshi, Jayanthi Sivaswamy, Prashanth R and S R Krishnadas - Detection of Peri-papillary Atrophy and RNFL Defect from Retinal Images Proceedings of International Conference on Image Analysis and Recognition , Aveiro 2012. [PDF]

  • Gopal Datt Joshi,, Jayanthi Sivaswamy and S. R. Krishnadas - Depth discontinuity-based cup segmentation from multi-vieew colour retinal images IEEE Transactions on Biomedical Engineering, 59(6), pp. 1523-1531,2012. [PDF]

  • Gopal Datt Joshi, Jayanthi Sivaswamy and S.R. Krishnadas - Optic Disk and Cup Segmentation from Monocular Colour Retinal Images for Glaucoma Assessment IEEE Transactions on Medical Imaging, 30(1), pp. 1192-1205, June,2011. [PDF]

  • Gopal Datt Joshi, Rohit Gautam, Jayanthi Sivaswamy and S. R. Krishnadas - Robust Optic Disk Segmentation from Colour Retinal Images Proceedings of Seventh Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP'10),12-15 Dec. 2010,Chennai, India. [PDF]

  • Gopal Datt Joshi, Jayanthi Sivaswamy, Kundan Karan, Prashanth R and S.R. Krishnadas - Vessel bend-based cup segmentation in retinal images Proceedings of 20th International Conference on Pattern Recognition (ICPR'10),23-26 Aug. 2010, Istanbul, Turkey. [PDF]

  • Gopal Datt Joshi, Jayanthi Sivaswamy, Kundan Karan and S.R. Krishnadas - Optic Disk and Cup Cup Boundary Detection Using Regional Information Proceedings of IEEE Internation Symposium on Biomedical Imaging: From Nano to Macro(ISBI'10), pp.948-951, 14-17 April, 2010, Rottendam, Netherlands. [PDF]

Clinical Abstracts

  • L. Chakrabarty, R. Krishnadas, G. D. Joshi, J. Sivaswamy - Automated Fundus Image Assessment:A Novel Screening Method for Glaucoma, 28th Congress of the Asia-Pacific Academy of Ophthalmology (APAO-AIOS), Hyderabad, 2013 & ASIA-ARVO, New Delhi, 2013.