CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Recognition of Indian Language Documents


Introduction

The present growth of digitization of documents demands an immediate solution to enable the archived valuable materials searchable and usable by users in order to achieve its objective. In response, active research has been going on in similar direction to make access to digital documents imminent through indexing and retrieval of relevant documents.

A direct solution to this problem is the use of character recognition systems to convert document images into text followed by the use of existing text search engines to make them available for the public at large via the Internet. Significant advancement is made in the recognition of documents written in Latin-based scripts. There are many excellent attempts in building robust document analysis systems in industry, academia and research institutions. Intelligent recognition systems are commercially available for use for some scripts. On the contrary, there is only limited research effort made for the recognition of African, Indian and other oriental languages. In addition diversity and complexity of documents archived in digital libraries complicate the problem of document analysis and understanding to a great extent. Machine learning offers one of the most cost effective and practical approaches to the design of pattern classifiers for such documents. Mechanisms such as relevance feedback and feature selection are used to facilitate adaptation to new situations, and to improve its performance accordingly. Deviating from the conventional OCR design, we explore the prospect of designing character recognition systems for large collection of document images.


Problems with existing OCRs

Digitized documents are extremely poor in quality, varying in scripts, fonts, sizes and styles. Besides, document images in digital libraries are from diverse languages and hence vary in scripts. Accessing from these complex collection of document images is a challenging task, specially when there is no textual representation available. Though there are many excellent research works in developing robust OCR systems, most of these research in the area of character recognition has been centered around developing fully automatic systems with high performance classifiers. The recognizers were trained offline and used online without any feedback. However the diversity of document collections (language-wise, quality-wise, time-wise, etc.) reduce the performance of the system greatly. Hence, building an omnifont OCR that can convert all these documents into text does not look imminent. The system fails mostly to scale to the expected level of performance when a new font and poor quality documents are presented. Therefore most of the documents in digital libraries are not accessible by their content.


Challenges in OCR for Indic Scripts

High accuracy OCR systems are reported for English with excellent performance in presence of printing variations and document degradation. For Indian and many other oriental languages,OCR systems are not yet able to successfully recognise printed document images of varying scripts, quality, size, style and font. compared to European languages, Indian languages pose many additional challenges. Some of them are (i) Large number of vowels, consonants, and conjuncts, (ii) Most scripts spread over several zones, (iii) inflectional in nature and having complex character grapheme, (iv) lack of statistical analysis of most popular fonts and/or databases, (v) lack of standard test databases (ground truth data) of the Indian languages, Also issues like, (i) lack of standard representation for the fonts and encoding, (ii) lack of support from operating system, browsers and keyboard, and (iii) lack of language processing routines, add to the complexity of the design and implementation of a document image retrieval system.


Character Recognition systems

We have an OCR system that is setup for the recognition of Indian and Amharic documents. It has many functionalities including preprocessing, character segmentation, feature extraction, classification and post-processing modules.

structure


Classifier design and feature selection for large class systems :-

The system accepts either already scanned documents or scans document pages from a flat-bed scanner. Scanned pages are preprocessed (binarized, skew corrected and noise removed) and segmented into character components. Then features are extracted for classification using dimensionality reduction scheme like Principal component analysis (PCA). PCA-based features enable us to manage the large number characters available in the script. These features are used for training the DDAG-based SVM classifier for classification. Finally characters are recognized and a post-processor is used to correct miss-classified once. Results of the OCR are converted texts that can be employed for retrieval tasks.

The performance of the OCR needs to be improved so that it can perform well on real-life documents such as books, magazines, newspapers, etc. To this end, we are working towards an intelligent OCR system that learns from its experience.

architectureAnnotated Corpora and Test Suite

Large annotated corpora is critical to the development of robust optical character recognizers (OCRs). we propose an efficient hierarchical approach for annotation of large collection of printed document images. The method is model-driven and is designed to annotate 50M characters from 40,000 documents scanned at three different resolutions. We employ an XML representation for storage of the annotation information. APIs are provided for access at content level for easy use in training and evaluation of OCRs and other document understanding tasks.

Document image annotation is carried out in a hierarchy of levels.
Block Level Annotation: In block level annotation, the document image is segmented into blocks of text paragraphs, tables, pictures and graphs.
Line level annotation is done by labeling the lines from paragraph of the document image with their corresponding extracted text lines. Line images are labeled by parallel alignment with the text lines.
Akshara annotation is the process of mapping a sequence of connected components from the word image to the corresponding text akshara. Image of akshara text is rendered for matching with components of the word image.


Machine Learning in OCR Systems

Adaptable OCR System (DAS) :-

architecture

We presents a novel approach for designing a semi-automatic adaptive OCR for large document image collections in digital libraries. We describe an interactive system for continuous improvement of the results of the OCR. In this paper a semi-automatic and adaptive system is implemented. Applicability of our design for the recognition of Indian Languages is demonstrated. Recognition errors are used to train the OCR again so that it adapts and learns for improving its accuracy. Limited human intervention is allowed for evaluating the output of the system and take corrective actions during the recognition process.

We design the general architecture of an interactivemultilingual OCR (IMOCR) system that is open for learning and adaptation. An overviewof the architecture of the system is shown in Figure. The IMOCR design is based on a multi-core approach. At the heart of the same is an application tier, which acts as the interface between the Graphical User Interface (GUI) and the OCR modules. This application layer identifies the user-made choices, initialises data and document structures and invokes relevant modules with suitable parameters. The GUI layer provides the user with the tools to configure the data-flow, select the plug-ins and supply initialization parameters. The system provides appropriate performance metrics in the form of graphs and tables for better visualization of the results of each step and module during the recognition process.

The last layer is the module/algorithm layer where the actual OCR operations are done. This layer is segmented based on clearly identified functionality. Each module implements a standard interface to be invoked via the application. Each module internally can decide on multiple algorithm implementations of the same functionality that may be interchanged at run-time. This helps in selection and use of an appropriate algorithm or a set of parameters for a book, collection or script. This layer is designed on the principle of plug-ins. The system allows transparent runtime addition and selection of modules (as shared objects/ dynamic libraries) thereby enabling the decoupling of the application and the plug-ins. Feature addition and deployment is as simple as copying the plug-in to the appropriate directory. The other advantages of this approach are lower size of application binary, lower runtime memory footprint and effective memory management (through dynamic loading and unloading, caching etc.).

Book OCR :-

bookocrTo alleviate such problems there is a need to build new generation document analysis systems; systems that are intelligent enough to learn and adjust themselves to the new documents that vary in quality, fonts, styles and sizes. The development of such systems should aim at learning not through explicit training, but through feedback at normal operation. Such systems are expected to register an acceptable accuracy rate so as to facilitate effective access to relevant documents from these large collection of document images.

To this end, we propose a new approach towards the recognition of large collection of text images using an interactive (adaptive) learning system. In this project we show for the first time an OCR system that learns from its mistakes and improves its performance continuously across a large collection of document images through feedback mechanism. The strategy of designing this system is with the aim of enabling the OCR learn on-the-fly using knowledge derived from an input sequence of word images. The resulting system is expected to learn the characteristics of symbol shape, context, and noise that are present in a collection of images in the corpus, and thus should be able to generalize and achieve higher accuracy across diverse documents.

This system enables us to build a generic OCR that has a minimal core, and leaves most of the sophisticated tuning to on-line learning. This can be a recurrent process involving frequent feedback and backtracking. Such a strategy is valuable to handle, in a situation like ours where there is a huge collection of degraded documents in different languages and there are human operators working on OCRing the scanned books. We integrate a number of modules to realize the learning framework. A post-processor based feedback mechanism is enabled to identify and pass recognition errors as new training datasets. The quality of the new training samples are controlled using validation techniques in the image-space. We propose incremental learning approach to enhance the efficiency of training the classifier on-line during normal operation. Sampling is done to make sure that each time new datasets are fed for training the classifier and appropriate features are selected at each node of the DDAG for pair-wise classification.

To suit Learn OCR there is a need to design intelligent postprocessor. At present we are investigating language modeling techniques in image space. Sample data is being generated by Learn OCR through feedback. There is a need to systematically use these data for performance improvement. We are also working towards the application of learning framework for labeling the unlabeled data.


Algorithms for Document Understanding

Segmentation :

The problem of document segmentation is that of dividing a document image (I) into a hierarchy of meaningful regions like paragraphs, text lines, words, image regions, etc. These regions are associated with a homogeneity property Φ( . ) and the segmentation algorithms are parameterized by θ . Conventionally, segmentation has been viewed as a deterministic partitioning scheme characterized by the parameter θ. The challenge has been in finding an optimal set of values for θ, to segment the input Image I into appropriate regions. In our formulation, the parameter vector θ is learned based on the feedback calculated in the form a homogeneity measure Φ( . ) (a model based metric to represent the distance from ideal segmentation). The feedback is propagated upwards in the hierarchy to improve the performance of each level above, estimating the new values of the parameters to improve the overall performance of the system. Hence the values of θ are learned based on the feedback from present and lower levels of the system. In order to improve the performance of an algorithm over time using multiple examples or by processing an image multiple times, the algorithm is given appropriate feedback in the form of homogeneity measure of the segmented regions. Feedback mechanisms for learning the parameters could be employed at various levels.

Perspective Correction for Camera based Document Analysis :

We demonestrate intelligent use of commonly available clues for rectification of docment images for camera-based analysis and recognition. Camear-based imaging has many challenges such as projection distortion, uneven lighting and lens distortion.

There are different rectification techniques. One is the deterimination of document image boundaries using aspect ratio of the documents, and parallel and perpendicular lines. If the original aspect ratio of the rectangle is known, the vertices of the quadrilaterals could be used to obtain the homography between an arbitrary view to the frontal view. The other is the use of page layout and structural information. We can extract information like text and graphics block present in the image, repetitive or apriori known structure of cells in tables. Finally, content specific rectification using the properties of text (or the content of the image itself). For example, sirorekha can be used for text written in Devanagari and Bangla in estimating the vanishing point


Related Publications

  • K.S.Sesh Kumar, Anoop M. Namboodiri and C. V. Jawahar - Learning Segmentation of Documents with Complex Scripts, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India LNCS 4338 pp.749-760, 2006. [PDF]

  • Sachin Rawat, K. S. Sesh Kumar, Million Meshesha, Indineel Deb Sikdar, A. Balasubramanian and C. V. Jawahar - A Semi-Automatic Adaptive OCR for Digital Libraries, Proceedings of Seventh IAPR Workshop on Document Analysis Systems, 2006 (LNCS 3872), pp 13-24. [PDF]

  • M. N. S. S. K. Pavan Kumar and C. V. Jawahar, Design of Hierarchical Classifier with Hybrid Architectures, Proceedings of First International Conference on Pattern Recognition and Machine Intelligence(PReMI 2005) Kolkata, India. December 2005, pp 276-279. [PDF]

  • Million Meshesha and C. V. Jawahar  - Recognition of Printed Amharic Documents, Proceedings of Eighth International Conference on Document Analysis and Recognition(ICDAR), Seoul, Korea 2005, Vol 1, pp 784-788. [PDF]

  • M. N. S. S. K. Pavan Kumar and C. V. Jawahar, Configurable Hybrid Architectures for Character Recognition Applications, Proceedings of Eighth International Conference on Document Analysis and Recognition(ICDAR), Seoul, Korea 2005, Vol 1, pp 1199-1203. [PDF]

  • C. V. Jawahar, MNSSK Pavan Kumar and S. S. Ravikiran - A Bilingual OCR system for Hindi-Telugu Documents and its Applications, Proceedings of the International Conference on Document Analysis and Recognition(ICDAR) Aug. 2003, Edinburgh, Scotland, pp. 408--413. [PDF]

 


Associated People

  • Million Meshesha
  • Balasubramanian Anand
  • Sesh Kumar.
  • L. Jagannathan
  • Neeba N V
  • Venkat Rasagna
  • Dr. C. V. Jawahar

 

Terrain Processing


Introduction

Digital Terrain Model refer to a data model that attempts to provide a three dimensional representation of a continuous surface, since terrains are two and a half dimensional rather than three dimensional. Terrains are basic models which come into picture in most of the games, animations and other multimedia and visualization application.

Terrain Geometry can also be stored remotely and recieved progressively when needed and rendered on the fly. Serving and streaming of Terrains can be beneficial and difficult. Dynamic environments such as those used for battlefield visualization involving real terrains and multiple players is an example. Geometry servers are useful to control the level of access of sensitive virtual environments. Thus, remote access of geometry is an interesting mix of data serving and content streaming.


Streaming & Renderingterrainintro

We aim at rendering rich terrains efficiently. Some of the features of the system are ::

  • Seamless and fast display of terrains integrating several levels of detail.
  • Real Time editing of Annotations and addition of Dynamic Entities on the terrain.
  • Smooth blending of heights over discrete Level of Detail is done to avoid popping of terrain.
  • User can record a walkthrough and play it as a demo later.
  • View Frustum Culling with Tiles as the basic entity reduces the data to be rendered.
  • The stereo modes supported are Anaglyphic, Polarization based Stereo and Shutter Glass Stereo.
  • Motion of objects like balls and mercury droplets is simulated.

terrainstreamingThe basic objective of a geometry server is to provide each client with data appropriate to it as quickly as possible. Main features of the streaming system developed are ::

  • A transparent streaming system which treats remote and local objects without distinction.
  • The client module performs caching and predictions needed for better performance and interface with the server.
  • A suitable level of detail is sent to each client based on capabilities of rendering hardware.
  • Support for Dynamic Objects and local modification.
  • Changes to an existing model in the virtual environment are notified to all clients possesing the same.
  • The system uses a combination of visibility culling, clientside caching, speculative prefetching, motion prediction and deep compression to achieve performance similar akin to local rendering.

 


Related Publication

  • Soumyjit Deb, P.J. Naryanan and Shiben Bhattacharjee - Streaming Terrain Rendering , The 33rd International Conference and Exhibition on Computer Graphics Interactive Techniques Boston Convention and Exhibition Center, Boston, Massachusetts USA, 30 July - 3 August, 2006. [PDF]

  • Soumyajit Deb, Shiben Bhattacharjee, Suryakant Patidar and P. J. Narayanan - Real-time Streaming and Rendering of Terrains, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.276-288, 2006. [PDF]

  • Soumyajit Deb and P. J. Narayanan, Design of a Geometry Streaming System, Proceedings of the Indian Conference on Vision, Graphics and Image Processing(ICVGIP), Dec. 2004, Calcutta, India, pp. 296--301. [PDF]


Associated People

Shiben Bhattacharjee
Suryakant Patidar
Soumyajit Deb
Prof. P. J. Narayanan

 

Ray Tracing on the GPU


Introduction

Ray tracing is used to render photorealistic images in computer graphics. We work on real time ray tracing of static, dynamic and deformable scenes composed of polygons as well as higher order surfaces. This has been made possible with the recent advances in GPU hardware and computing resources. We have developed various algorithms that perform real time ray tracing of various kinds of scenes by constructing efficient acceleration structures on the GPU as well as algorithms that work without constructing any explicit structure.


Projects involving Ray Tracing using CUDA

1. Parallel Divide and Conquer Ray Tracingbuddha

Divide and Conquer Ray Tracing is a recent technique in which ray tracing is performed without constructing any explicit acceleration structure. We have developed a parallel version of the algorithm that runs entirely on the GPU using efficient parallel primitives like sort and reduce. Our approach suits the GPU well, with a low memory footprint. Our implementation outperforms the serial CPU algorithm for both primary and secondary ray passes.

 

pt

2. Hybrid Ray Tracing and Path Tracing of Bezier Surfaces Using A Mixed Hierarchy

We present a scheme for interactive ray tracing of Bezier bicubic patches using Newton iteration. We use a mixed hierarchy representation as the acceleration structure. This has a bounding volume hierarchy above the patches and a fixed depth subpatch tree below it. This helps reduce the number of ray-patch intersections that needs to be evaluated and provides good initialization for the iterative step, keeping the memory requirements low. We use Newton iteration on the generated list of ray patch intersections in parallel. Our method can exploit the cores of the CPU and the GPU with OpenMP on the CPU and CUDA on the GPU by sharing work between them according to their relative speeds. A data parallel framework is used with a list of rays, which is transformed to a list of ray-patch intersections by traversal and then to intersections and a list of secondary rays by root finding. Shadow and reflection rays can be handled exactly in the same manner as a result. We also show how our method extends easily to generate soft shadows using area light sources and path tracing by tracing a large number of rays per pixel.

 

3. Ray Tracing Dynamic Scenes on the GPU dynamic

We present fast ray tracing of dynamic scenes with primary and shadow rays. We present a GPU- friendly strategy to bring coherency to shadow rays, based on previous work on grids as acceleration structures. We introduce indirect mapping of threads to rays to improve the performan ce of ray tracing on the GPU for the traversal and intersection steps. We also construct a light frustum in a sph erical space for shadow rays. A grid structure is constructed each frame for the light frustum and traversed coh erently. This involves careful mapping of the primary ray information to the light space and balancing the work load of th e threads. Using the finegrained parallelism of GPU, we reorder the shadow rays to make them coherent a nd process multiple thread blocks to each cell to balance the work load. Spherical mapping is key to handling ligh t sources placed anywhere in the scene by reducing the triangle count and improving performance in shad ow checking. In addition it also allows us to introduce spotlights in raytracing. In practice, we attain interactive perform ance for moderately large models which change dynamically in the scene.

 

banchoffeighteen4. Ray Tracing Implicit Surfaces in Real-Time

We present a ray-tracing procedure to render general implicit surfaces efficiently on the GPU. Though only the fourth or lower order surfaces can be rendered using analytical roots, our adaptive marching points algorithm can ray trace arbitrary implicit surfaces without multiple roots, by sampling the ray at selected points till a root is found. Adapting the sampling step size based on a proximity measure and a horizon measure delivers high speed. The sign test can handle any surface without multiple roots. The Taylor test that uses ideas from interval analysis can ray trace many surfaces with complex roots. Overall, a simple algorithm that fits the SIMD architecture of the GPU results in high performance.

 

 

2

5. Ray Casting Deformable Models on CUDA

We explore the problem of real time ray casting of large deformable models (over a million triangles) on large displays (a million pixels) on an off-the-shelf GPU. We build a GPU-efficient three-dimensional data structure for this purpose and a corresponding algorithm that uses it for fast ray casting. We also present fast methods to build the data structure on the SIMD GPUs, including a fast multi-split operation. We achieve real-time ray-casting of a million triangle model onto a million pixels on current generation hardware.

 

 


Related Publications

  • Srinath Ravichandran and P. J. Narayanan - Parallel Divide and Conquer Ray Tracing In SIGGRAPH ASIA 2013 Technical Briefs, 19-22nd Nov. 2013, Hong Kong. [PDF]

  • Rohit Nigam, P J Narayanan - Hybrid Ray Tracing and Path Tracing of Bezier Surfaces Using A Mixed Hierarchy Proceedings of the 8th Indian Conference on Vision, Graphics and Image Processing, 16-19 Dec. 2012, Bombay, India. [PDF]

  • Sashidhar Guntury and P.J. Narayanan - Raytracing Dynamic Scenes on the GPU Using Grids IEEE Transactions on Visualization and Computer Graphics, Vol.18, No.1 pp. 5-16, 2012. [PDF]

  • Sashidhar Guntury and P. J. Narayanan - Ray Tracing Dynamic Scenes with Shadows on the GPU Proceedings of Eurographics Symposium on Parallel Graphics and Visualization (EGPGV'10), pp.27-34, 2-3 May, 2010, Norrkoping, Sweden. [PDF]

  • Jag Mohan Singh and P. J. Narayanan - Real-Time Ray Tracing of Implicit Surfaces on the GPU IEEE Transactions Visualization and Computer Graphics, Vol. 16(2), pp. 261-272 (2010). [PDF]

  • Suryakant Patidar and P. J. Narayanan - Ray Casting Deformable Models In: ACM ICVGIP 2008. [PDF]

Associated People

  • Srinath Ravichandran
  • Sashidhar Guntury
  • Suryakant Patidar
  • Rohit Nigam
  • Jag Mohan Singh
  • Prof. P.J. Narayanan

 

Scene Text Understanding


 

I4 I1 I3

                     

Motivation

Recognizing scene text is a challenging problem, even more so than the recognition of scanned documents. Given the rapid growth of camera-based applications readily available on mobile phones, understanding scene text is more important than ever. One could, for instance, foresee an application to answer questions such as, “What does this sign say?”. This is related to the problem of Optical Character Recognition (OCR), which has a long history in the computer vision community. However, the success of OCR systems is largely restricted to text from scanned documents. Scene text exhibits a large variability in appearances, and can prove to be challenging even for the state-of-the-art OCR methods. Many scene understanding methods recognize objects and regions like roads, trees, sky in the image successfully, but tend to ignore the text on the sign board. Our goal is to fill this gap in understanding the scene.


Highlights

  • Binarization as a labelling problem (see our ICDAR'11 paper)
  • Both open and closed vocabulary word recognition (see our CVPR'12 paper and BMVC'12 paper)
  • Use of both top-down (lexicons) and bottom-up cues (character detection)
  • Holistic recognition approach of lexicon-driven scene text recognition (see our ICDAR'13 paper)
  • Applications in image retrieval (see our ICCV'13 paper)
  • Word recognition for large lexicions and cropped word image retrieval (see our ACCV'14 paper)

Code

Generating exemplars (based on our ICDAR'13 paper) README

Coming Soon:

  • Scene Text Binarization
  • Scene Character Recognition

Datasets

Lexicons (ACCV '14)

IIIT 5K-word

SVT-CHAR (9.9MB)  README

IIIT Scene Text Retrieval (IIIT STR)

Video Scene Text Retrieval Datasets (Sports-10K and TV series-1M)


Related Publications

  • Anand Mishra, Karteek Alahari and C. V. Jawahar - Enhancing energy minimization framework for scene text recognition with top-down cues - Computer Vision and Image Understanding (CVIU 2016), volume 145, pages 30–42, 2016. [PDF]

  • Udit Roy, Anand Mishra, Karteek Alahari, C.V. Jawahar - Scene Text Recognition and Retrieval for Large Lexicons Proceedings of the 12th Asian Conference on Computer Vision,01-05 Nov 2014, Singapore. [PDF] [Abstract] [Poster] [Lexicons] [bibtex]

  • Anand Mishra, Karteek Alahari and C V Jawahar - Image Retrieval using Textual Cues Proceedings of International Conference on Computer Vision, 1-8th Dec.2013, Sydney, Australia. [Pdf] [Abstract] [Project page][bibtex]

  • Vibhor Goel, Anand Mishra, Karteek Alahari, C V Jawahar - Whole is Greater than Sum of Parts: Recognizing Scene Text Words Proceedings of the 12th International Conference on Document Analysis and Recognition, 25-28 Aug. 2013, Washington DC, USA. [PDF] [Abstract] [bibtex]

  • Anand Mishra, Karteek Alahari and C V Jawahar - Scene Text Recognition using Higher Order Language Priors Proceedings of British Machine Vision Conference, 3-7 Sep. 2012, Guildford, UK. [PDF] [Abstract] [Slides] [bibtex]

  • Anand Mishra, Karteek Alahari and C V Jawahar - Top-down and Bottom-up Cues for Scene Text Recognition Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 16-21 June 2012, pp. 2287-2294, Providence RI, USA. [PDF] [Abstract] [Poster] [bibtex]

  • Anand Mishra, Karteek Alahari and C.V. Jawahar - An MRF Model for Binarization of Natural Scene Text Proceedings of 11th International Conference on Document Analysis and Recognition (ICDAR 2011),18-21 September, 2011, Beijing, China. [PDF] [Abstract] [Slides] [bibtex]

 


People

  • Udit Roy
  • Anand Mishra
  • Karteek Alahari
  • C. V. Jawahar

Acknowledgements

Anand Mishra is partly supported by MSR India PhD Fellowship 2012. 


Copyright Notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright.

Computational Displays


Introduction

Displays have seen many improvements over the years. With advancements in pixel resolution, color gamut, vertical refresh rates and power consumption, displays today have become more personal and accessible devices for sharing and feeding information. Advancements in computer human interaction have provided novel interactions with current displays. Touch panels are now common and provide better interaction with the cyberworld shown on a device. Displays today have many shortcomings still. These include rectangular shape, low color gamut, low dynamic range, lack of focus and context in a scene, lack of 3D viewing, etc. Efforts are being made to create better and more natural displays than 2D flat rectangular screens we see today. Much research has gone into designing better displays including 3D displays, focus and context displays, HDR displays, etc. Such displays work at the device level and use physical (mechanical, metallurgical, chemical, etc) means to create better displays. Such approaches prove expensive and are hard to scale to acceptable standards of color, refresh rate, etc. of current displays. An alternate to this is to use computation to enhance displays. We propose Computational Displays, which employ computation to economically alleviate some of the shortcomings of availabe displays. Specifically we give solutions to the walk-around 3D viewing, focus+context and color resolution problems The systems work on top of existing display methodologies and are independent of any specific display technology. This makes our systems scalable to any display method and enhances the same using computation. In conclusion, we argue that such computation/algorithms should be built into the displays themselves to enhance the visual experience.


View Dependent, Multi-Planar, Walkaround 3D Displays

 

MultiPlanar

Displays have remained flat for most part since their inception. In this work we focus on non-planar displays made out of planar facets. We focus on displaying 3D scenes in perspective correct manner to such displays. We propose an accurate rendering mechanism using GPU shaders that produces correct color and depth map on each facet, ensuring consistency across facet boundaries. We compare our results with various previous methods such as CAVE and Projective Texture Mapping and conclude that our method is artifact free and superior in terms of visual quality - a requirement for visualization applications. We also provide a scalable GPU culling algorithm to scale our rendering scheme to any display shape, consisting of over a thousand facets. The pipeline proposed uses commodity GPUs to implement the system and can handle any type of scene, even mesh deformations as shown in the video.


Distributed Massive Model Rendering

 

mpp

Graphics models are getting increasingly bulkier with detailed geometry, textures, normal maps, etc. There is a lot of interest to model and navigate through detailed models of large monuments. Many monuments of interest have both rich detail and large spatial extent. Rendering them for navigation on a single workstation is practically impossible, even given the power of today's CPUs and GPUs. Many models may not fit in the GPU memory, the CPU memory, or even the secondary storage of the CPU. Distributed rendering using a cluster of workstations is the only way to navigate through such models. We present a design of a Distributed rendering system intended for massive models. Our design has a server that holds the skeleton of the whole model, namely, its scenegraph with actual geometry replaced by bounding boxes at all levels. The server divides the screen space among a number of clients and sends them a list of objects they need to render using a frustum culling step. The clients use 2 GPUs with one devoted to visibility culling and the other to rendering. Frustum culling at the server, visibility culling on one GPU, and rendering on the second GPU form the stages of our distributed rendering pipeline. We describe the design and implementation of our system and demonstrated the results of rendering relatively large models using different clusters of clients in this work. The demonstration video shows the interactive rendering of huge scenes of Coal Powerplant (approx. ~96M triangles) and Fatehpur Sikri (~172M triangles) models on a 4-client setup.


Increasing Intensity Resolution on Single Displays

 

SpatioTemporal

Displays have seen many improvements in spatial resolutions and vertical refresh, to provide a smoother visual experience. Color intensity resolution, however, has not changed much. Most displays are still limited to 8-bits per channel. Simultaneously, much work has gone into capturing high dynamic range images. Mapping these directly to current displays loses information that may be critical to many applications. We present a way to enhance intensity resolution of a given display by mixing intensities over spatial or temporal domains. Our system sacrifices high vertical refresh and spatial resolution in order to gain intensity resolution. We present three ways to mix intensities: spatially, temporally and spatio-temporally. The systems produce in-between-intensities not present on the base display, which are clearly distinguishable by the naked eye. We evaluate our systems using both a camera and human subjects, evaluating if our system scales the intensity resolution and also ensuring the newly generated intensities follow the display model.


View Dependent Parametric/Implicit Displays

 

Parametric

This work extends the view dependent displays to non-planar parametric shapes. In this the display surface is not defined using planar facets, but by a set of parametric or implicit equations. To render perspective correct 3D on to such a display requires non-linear rasterization. No graphics pipeline at the moment supports this. We approximate it using the tessellation hardware provided by the new Shader Model 5.0 / DirectX 11 GPUs. We break large triangles on the fly to a user defined error threshold using the tessellation hardware. The resulting vertices are then moved using per-vertex raycasting to correct their positions in the observer's eye. This results in correct perspective view in the eye of a user whose head is tracked. The method is an approximate rendering scheme to any parametric display, however does not interpolate pixels as opposed to Projective Texture Mapping. The rendering speed is orders faster than approximating the surface using planes because of single pass rendering and the use of tessellation hardware. 


Garuda: A Scalable, Tiled Display Wall using Commodity PCs

 

Garuda

Garuda is a client-server based display wall system employing distributed rendering to render massive 3D environments at interactive framerates to a tiled display. The system is designed and built using commodity hardware. Features such as client caching and use of server-push philosophy in conjunction with UDP multicast help the system scale to very large tiled configurations. The system uses a novel culling algorithm to find what part of the scene is in which of the tiles. This is a scalable algorithm both in terms of scene complexity and number of tiles. The system is built as a library intercept mechanism over the Open Scene Graph (OSG) API. This ensures that any OSG application can be ported to Garuda without the need of modifying, recompiling or relinking the code. The OSG executable directly runs on the tiled display using this. Rendering capabilities increase with number of tiles as distributed rendering is employed. No machine in Garuda, server or clients, renders the entire environment. Culling is performed at the server and part of the scene graph are sent to clients over standard Ethernet. Clients cache the geometry and evict it based on LRU, this exploits the temporal coherence in the scene. Rendering happens at client end with the server used to to synchronize the buffer swap.


Related Publications

  • Pawan Harish, Parikshit Sakurikar, P J Narayanan - Increasing Instensity Resolution on a Single Display using Spatio-Temporal Mixing Proceedings of the 8th Indian Conference on Vision, Graphics and Image Processing, 16-19 Dec. 2012, Bombay, India. [PDF]

  • Revanth N R, P J Narayanan - Distributed Massive Model Rendering Proceedings of the 8th Indian Conference on Vision, Graphics and Image Processing, 16-19 Dec. 2012, Bombay, India. [PDF]

  • Pawan Harish and P. J. Narayanan - Designing Perspectively-Correct Multiplanar Display IEEE Transactions on Visualization and Computer Graphics, Vol.99, 2012. [PDF]

  • Nirnimesh, Pawan Harish and P. J. Narayanan - Culling an Object Hierarchy to a Frustum Hierarchy, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCL 4338 pp.252-263,2006. [PDF]

 

  • Pawan Harish, Parikshit Sakurikar, P. J. Narayanan - Spatio-Temporal Mixing to Increase Intensity Resolution on a Single Display (Poster) In CVPR Workshop on Computational Cameras and Displays 2012. [PDF]
  • Pawan Harish, P. J. Narayanan - View Dependent Rendering to Simple Parametric Display Surfaces (Short Paper)" In JVRC of EuroVR-EGVE, 2011. [PDF]
  • Pawan Harish, P. J. Narayanan - A View Dependent, Polyhedral 3D Display, in ACM SIGGRAPH Virtual Reality Continuum and its Applications in Industry 2009. [PDF]
  • Nirnimesh, Pawan Harish, P. J. Narayanan - Garuda: A Scalable Tiled Display Wall using Commodity PCs, In IEEE TVCG, 2007. [PDF]

 Related Videos

MultiPlanarSnap
Multiplanar Displays.
ParametricSnap
Parametric Displays.
PolyhedralSnap
Polyhedral Displays.
GarudaSnap
Garuda Display Wall.

Associated People

  • Revanth N R
  • Parikshit Sakurikar
  • Pawan Harish
  • Nirnimesh
  • Prof. P. J. Narayanan

More Articles …

  1. Image Annotation
  2. Decomposing Bag of Words Histograms
  3. Action Recognition using Canonical Correlation Kernels
  4. Motion Trajectory Based Video Retrieval
  • Start
  • Prev
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. Projects
  5. CVIT Projects
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.