CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Real Time Rendering of Implicit Surfaces on the GPU


Jag Mohan Singh

Generating visually realistic looking models is one of the core problems of Computer Graphics. Rasterization or scan converting the primitives used such as triangles is one method to render them. This method suffers from problems of an inexact representation as triangles themselves are an approximation of the underlying geometry. Ray tracing primitives is another method of rendering the objects. This method delivers exact representation of the underlying geometry and looks visually realistic. We thus use ray tracing of implicit surfaces rather than polygonizing them. The programmable graphics processor units (GPUs) have high computation capabilities but relatively limited bandwidth for data access. Compact representation of geometry using a suitable procedural or mathematical model and a ray-tracing mode of rendering fit the GPUs well, consequently. An implicit surface can be represented as S(x,y,z) = 0 and the ray dependent equation is F_f(t) = 0. Ray tracing S(x,y,z) = 0 is root computation of F_f(t) = 0 for all the pixels on the screen. Analytical methods can be used in surfaces up to order 4. We compute interval extension of functions exactly by computing the function at points of maxima and minima and end points. Since, we can compute roots of functions up to order 4 we can compute points of maxima and minima of functions up to order 5. We use interval arithmetic for surfaces up to order 5 using Mitchell's algorithm. Interval methods provide a robust way for root isolation. Marching points algorithm marches in equal stepsizes until the root is found which is detected by a sign change in the function. Marching points wastes computation by computing the function values at many points. Adaptive marching points algorithm marches adaptively to find the root. Though only fourth or lower order surfaces can be rendered using analytical roots, our adaptive marching points algorithm can ray-trace arbitrary implicit surfaces exactly, by sampling the ray at selected points till a root is found. Adapting the sampling step size based on a proximity measure and a horizon measure delivers high speed. The horizon measure helps in silhouette adaptation and provides good quality silhouettes. We also provide a taylor test which has flavours of interval arithmetic and helps in robust rendering of surfaces using adaptive marching points algorithm. While computing the function S(x,y,z) = 0 we never compute the ray dependent F_f(t) = 0 by using coefficients of t. We save lot of computational overhead by computing S(x,y,z) = 0 directly instead as there are O(d^3) coefficients for t where d is the degree of the surface. In our method we don't need coefficients of t which are expensive to compute we only need the value S(x,y,z) = 0. The derivative F'_f(t) can also be calculated efficiently using the gradient of S() as grad(S(x, y, z)) dot D_f. The Barth decic can be evaluated using about 30 terms as S(x, y,z) but needs to evaluate 1373 terms to compute all 11 coefficients of the tenth order polynomial F_f(t). We render Dynamic Implicit Surfaces which vary with time. Overall, a simple algorithm that fits the SIMD architecture of the GPU results in high performance. We ray-trace algebraic surfaces up to order 18 and non-algebraic surfaces including a Blinn's blobby with 30 spheres at better than interactive frame rates. Our adaptive marching points is an ideal match for the SIMD model of GPU due to low computational cost required per operation. We use analytical methods for ray tracing surfaces up to order 4. We achieve fps of 3750 on a cubic surface and 1400 on a quartic surface. We use the robust Mitchell method on surfaces up to order 5 and achieve fps up to 400 on a torus quartic and 85 on a quintic surface. Our adaptive marching points method renders high order implicit surfaces at interactive frame rates. We render surface of order 18 at an fps of 158. These experiments used NVIDIA 8800 GTX at a resolution of 512x512. Our GPU Objects renders Bunny with 35,947 spheres at 57 fps, 99,130 spheres is rendered at 30 fps and Hyperboloid with reflection and refraction at 300 fps. NVIDIA 6600 GTX was used in experiments related to GPU Objects and the viewport was of the size 512x512.

teaser

 

Year of completion:  December 2008
 Advisor : P. J. Narayanan

Related Publications

  • Jag Mohan Singh and P. J. Narayanan - Real-Time Ray Tracing of Implicit Surfaces on the GPU IEEE Transactions Visualization and Computer Graphics, Vol. 16(2), pp. 261-272 (2010). [PDF]

  • Visesh Chari,  Jag Mohan Singh and P. J. Narayanan - Augmented Reality using Over-Segmentation Proceedings of National Conference on Computer Vision Pattern Recognition Image Processing and Graphics (NCVPRIPG'08),Jan 11-13, 2008, DA-IICT, Gandhinagar, India. [PDF]

  • Kedarnath Thangudu, Lakshmi Gade, Jag Mohan Singh and P. J. Narayanan - Point Based Representations for Hierarchical Environments In International Conference on computing: Theory and Applications(ICCTA), Kolkatta, 2007. [PDF]

  • Jag Mohan Singh and P.J. Narayanan - Progressive Decomposition of Point Clouds Without Local Planes, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.364-375, 2006. [PDF]

  • Sunil Mohan Ranta, Jag Mohan singh and P.J. Narayanan - GPU Objects , 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.352-363, 2006. [PDF]


Downloads

thesis

 ppt

 

 

Multiple View Geometry Applications to Robotic and Visual Servoing


Visesh Chari (homepage)

Computer Vision may be described as the process of scene understanding through analysis of images captured by a camera. Understanding of a scene has several aspects associated with it, and this makes the field of computer vision a very vibrant and active field of research. For example, a section of the computer vision research concentrates on the understanding of the inherent characteristics of an object (identifying clauses like "this is a face", "this is a car" etc.). Yet another branch focuses on answering questions like "find the given face in this image" or "find where cars occur in this image". Another, more primitive of the branches, concerns itself with the estimation of the geometry of the scene. It answers questions like "what is the shape of this face", or "how would this car look from that viewpoint". This branch, and its various derivatives come under the name "Multiple View Geometry". Technically, Multiple View Geometry concerns itself with the geometric interaction of the 3D world with images captured by the camera, and the interpretation and manipulation of this information for various tasks. Multiple View Geometry (MVG) is two decades old in its research, and borrows heavily from a related field called Photogrammetry. Over the course of these years, many algorithms have been proposed for the estimation of geometric quantities like the transformation between cameras viewing a scene, or the 3D structure of a particular object being viewed by multiple cameras etc. The field has matured recently, with focus shifting towards producing globally optimal estimates of geometric quantities like transformations and structure, analysis of cases where the problem of geometric inference or manipulation is NP-Complete, etc. Even before maturity, many of the algorithms in Multiple View Geometry have found applications. The simple mosaicing solutions available in digital cameras these days, owes its origin to one such algorithm. Applications have also been spawned in areas like animation for films, robot motion in automated surgery and industrial environments, security systems that employ hundreds of cameras, etc.

collage

 

This thesis focuses on the application front of Multiple View Geometry, which has started gaining popularity. To this extent, we leverage some of the concepts of MVG, to develop new frameworks and algorithms for a variety of problems.

trackingFor this reason, we choose to explore the use of MVG in various robotics and computer vision tasks in this thesis. We first propose a tracking framework that utilizes various cues like textures and edges to perform tracking of 2D and 3D objects in various views of a scene. Tracking refers to the task of estimating the location and orientation of an object with respect to a pre-defined world coordinate system. Traditionally, filters like the Kalman Filter and its variants have been used for tracking purposes. Problems like illumination change and occlusion have affected many of these algorithms that make assumptions like uniform intensity of objects across views, etc. We show that by embedding MVG into tracking algorithms, we can achieve efficient tracking of objects, that is robust to large changes in perspective, illumination and occlusion. A by-product is the estimation of the pose of the camera, which in itself is useful for tasks like localization in a mobile environment. 

 

 

fouriervsThen we show an application of frequency domain based MVG to the task of robot positioning. Positioning (or Visual Servoing) is a task that enables a robot to assume a desired pose with respect to an object of interest, with the help of a camera. This object might be a heart, as in surgery, or an automobile part, as in industrial settings. We show that by using frequency domain techniques in MVG, we can achieve algorithms that require only rough correspondence between images, unlike earlier algorithms that needed specific point-to-point correspondences. This is further developed into a general framework for servoing that is capable of straight Cartesian paths and path following, which are recent problems in servoing.

 

 

 

inpaintingWithin computer vision, we explore the use of MVG for various image and video editing tasks. Tasks like removing a scene object from a video in a consistent manner would fall in this category (Predicting how the video would look like without the object). In this area, we propose an algorithm for video inpainting, where specific objects from a video are removed and resulting space-time holes are filled in a consistent manner. The algorithm is fully automatic unlike traditional image and video inpainting algorithms, and takes as input two functions; one function defines the object to be removed, and the other defines the background model that is used for hole-filling. 

 

 

 

 

IBR result

We then extend this algorithim to obtain Image Extrapolation, which is concerned with prediction of the future of a scene using available content about it. This is different from Inference in the sense that no data is actually available to confirm our predictions and hence several alternatives remain equally viable. In this direction, we propose an inpainting based framework for Image Based Rendering (IBR). Image Based Rendering (IBR) concerns itself with algorithms for an image based representation of the 3D information of a scene. Novel views of the scene can then be rendered with this information. We extend IBR to include cases when 3D information about a particular scene is incomplete, by incorporating information about the type of scene being viewed (for eg. the face of a person). We then devise algorithms to transfer specific semantic characteristics to the current scene from similar scenes available to us.

 

 

 

Year of completion:  December 2008
 Advisor : C. V. Jawahar & P. J. Narayanan

Related Publications

  • Visesh Chari, Avinash Sharma, Anoop M Namboodiri and C.V. Jawahar - Frequency Domain Visual Servoing using Planar Contours IEEE Sixth Indian Conference on Computer Vision, Graphics & Image Processing (ICVGIP 2008), pp. 87-94, 16-19 Dec,2008, Bhubaneswar, India. [PDF]

  • Visesh Chari, C. V. Jawahar, P. J. Narayanan - Video Completion as Noise Removal Proceedings of National Conference on Communications (NCC'08), Feb 1-3, 2008, IIT Mumbai, India. [PDF]

  • Visesh Chari,  Jag Mohan Singh and P. J. Narayanan - Augmented Reality using Over-Segmentation Proceedings of National Conference on Computer Vision Pattern Recognition Image Processing and Graphics (NCVPRIPG'08),Jan 11-13, 2008, DA-IICT, Gandhinagar, India. [PDF]

  • A.H. Abdul Hafez, Visesh Chari and C.V. Jawahar - Combining Texture and Edge Planar Tracker based on a local Quality Metric Proc. of IEEE International Conference on Robotics and Automation(ICRA'07), Roma, Italy, 2007. [ PDF ]

 


Downloads

thesis

 ppt

Scalable Primitives for Data Mapping and Movement on the GPU


Suryakant Patidar (homepage)

 

meGPUs have been used increasingly for a wide range of problems involving heavy computations in graphics, computer vision, scientific processing, etc. One of the key aspects for their wide acceptance is the high performance to cost ratio. In less than a decade, GPUs have grown from non-programmable graphics co-processors to a general-purpose unit with a high level language interface that delivers 1 TFLOPs for $400. GPU's architecture including the core layout, memory, scheduling, etc. is largely hidden. It also changes more frequently than the single core and multi core CPU architecture. This makes it difficult to extract good performance for non-expert users. Suboptimal implementations can pay severe performance penalties on the GPU. This is likely to persist as many-core architectures and massively multithreaded programming models gain popularity in the future.

bunnyOne way to exploit the GPU's computing power effectively is through high level primitives upon which other computations can be built. All architecture specific optimizations can be incorporated into the primitives by designing and implementing them carefully. Data parallel primitives play the role of building blocks to many other algorithms on the fundamentally SIMD architecture of the GPU. Operations like sorting, searching etc., have been implemented for large data sets.

We present efficient implementations of a few primitives for data mapping and data distribution on the massively multi-threaded architecture of the GPU. The split primitive distributes elements of a list according to their category. Split is an important operation for data mapping and is used to build data structures, distribute work load, performing database join queries etc. Simultaneous operations on a common memory is the main problem for parallel split and other applications on the GPU. He et al. overcame the problem of simultaneous writes by using personal memory space for each parallel thread on the GPU. Limited small shared memory available limits the maximum number of categories they can handle to 64. We use hardware atomic operations for split and propose ordered atomic operations for stable split which maintains the original order for elements belonging to the same category. Due to limited shared memory, such a split can only be performed to maximum of 2048 bins in a single pass. For number of bins higher than that, we propose an Iterative Split approach which can handle billions of bins using multiple passes of stable split operation by using an efficient basic split which splits the data to fixed number of categories. We also present a variant of split that partitions the dragonindexes of records. This facilitates the use of the GPU as a co-processor for split or sort, with the actual data movement handled separately. We can compute the split indexes for a list of 32 million records in 180 milliseconds for a 32-bit key and in 800 ms for a 96-bit key.

The gather and scatter primitive performs fast, distributed data movement. Efficient data movement is critical to high performance on the GPUs as suboptimal memory accesses can pay heavy penalties. In spite of high-bandwidth (130 GBps) offered by the current GPUs, naive implementation of the above operations hampers the performance and can only utilize a part of the bandwidth. The {\em instantaneous locality of memory reference} play a critical role in data movement on the current GPU memory architectures. For scatter and gather involving large records, we use collective data movement in which multiple threads cooperate on individual records to improve the instantaneous locality. We use multiple threads to move bytes of a record, maximizing the coalesced memory access. Our implementation of gather and scatter operations efficiently moves multi element records on the GPU. These data movement primitives can be used in conjunction with split for splitting of large records on the GPU.

We extend the split primitive to devide a SplitSort algorithm that can sort 32-bit, 64-bit and 128-bit integers on the GPU. The performance of SplitSort is faster than the best GPU sort available today by Satish et al for 32 bit integers. To our knowledge we are the first to present results on sorting 64-bits and higher integers on the GPU. With split and gather operations we can sort large data records by first sorting their indexes and then moving the original data efficiently. We show sorting of 16 million 128-byte records in 379 milliseconds with 4-byte keys and in 556 ms with 8-byte keys.

Using our fast split primitive, we explore the problem of real time ray casting of large deformable models (over a million triangles) on large displays (a million pixels) on an on-the-shelf GPU. We build a GPU-efficient three dimensional data structure for this purpose and a corresponding algorithm that uses it for fast ray casting. The data structure provides us with blocks of pixels and their corresponding geometry in a list of cells. Thus, each block of pixels can work in parallel on the GPU contributing a region of output to the final image. Our algorithm builds the data structure rapidly using the split operation for each frame (5 milliseconds for 1 million triangles) and can thus support deforming geometry by rebuilding the data structure every frame. Early work on ray tracing by Purcell et al built the data structures once on the CPU and used the GPU for repeated ray tracing using the data structure. Recent work on ray tracing of deformable objects by Lauterbach et al handles light models with upto 200K triangles at 7-10 fps. We achieve real-time rates (25 fps) for ray-casting a million triangle model onto a million pixels on current Nvidia GPUs.

Primitives we proposed are widely used and our results show that their performance scales logarithmically with the number of categories, linearly with the list length, and linearly with the number of cores on the GPU. This makes it useful for applications that deal with large data sets. The ideas presented in the thesis are likely to extend to later models and architectures of the GPU as well as to other multi core architectures. Our implementation for the data primitives viz. split, sort, gather, scatter and their combinations are expected to be widely used by future GPU programmers. A recent algorithm by Vineet et al. computes the minimum spanning tree on large graphs used the split primitive to improve the performance.

 

Year of completion:  June 2009
 Advisor : P. J. Narayanan

Related Publications

  • Vibhav Vineet, Pawan Harish, Suryakanth Patidar and P. J. Narayanan - Fast Minimum Spanning Tree for Large Graphs on the GPU Proceedings of High-Performance Graphics (HPG 09), 1-3 August, 2009, Louisiana, USA. [PDF]

  • Shiben Bhattacharjee, Suryakanth Patidar and P. J. Narayanan - Real-time Rendering and Manipulation of Large Terrains IEEE Sixth Indian Conference on Computer Vision, Graphics & Image Processing (ICVGIP 2008), pp. 551-559, 16-19 Dec,2008, Bhubaneswar, India. [PDF]

  • Suryakanth Patidar and P.J. Narayanan - Ray Casting Deformable Model on the GPU IEEE Sixth Indian Conference on Computer Vision, Graphics & Image Processing (ICVGIP 2008), pp. 481-488, 16-19 Dec,2008, Bhubaneswar, India. [PDF]

  • Soumyajit Deb, Shiben Bhattacharjee, Suryakant Patidar and P. J. Narayanan - Real-time Streaming and Rendering of Terrains, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.276-288, 2006. [PDF]

  • Suryakant Patidar, P. J. Narayanan - Scalable Split and Sort Primitives using Ordered Atomic Operations on the GPU, High Performance Graphics (Poster), April 2009. [PDF]
  • Kishore K, Rishabh M, Suhail Rehman, P. J. Narayanan, Kannan S, Suryakant Patidar - A Performance Prediction Model for th e CUDA GPGPU Platform. International Conference on High Performance Computing, April 2009. [PDF]
  • Suryakant Patidar, Shiben Bhattacharjee, Jag Mohan Singh, P. J. Narayanan - Exploiting the Shader Model 4.0 Architecture, IIIT/TR/2007/145, March 2007. [PDF]

Downloads

thesis

 ppt

 

 

Learning in Large Scale Image Retrieval Systems


Pradhee Tandon

Recent explosive growth in images and videos accessible to any individual on the Internet have made automatic management the prime choice. Contemporary systems used tags but over time they were found to be inadequate and unreliable. This has brought content based retrieval and management of such data to the fore front of research in the information retrieval community.

Content based retrieval methods generally represent visual information in the images or videos in terms of machine centric numeric features. These allow efficient processing and management of a large volume of data. These features are primitive and thus, are incapable of capturing the way humans perceive visual content. This leads to a semantic gap between the user and the system, resulting in poor user satisfaction. User centric techniques are needed which will help reduce this gap efficiently. Given the ever expanding volume of images and videos, techniques should also be able to retrieve in real time from millions of samples. A practical image retrieval approach in summary is expected to perform well on most of the following parameters, (1) acceptable accuracy, (2) efficiency, (3) minimal and non-cumbersome user input, (4) scalability to large collections (millions) of objects, (5) support interactive retrieval and (6) meaningful presentation of results. Most of the noted efforts in CBIR literature have focused primarily on providing answers to only subsets of the above. A real world system, on the other hand, requires practical solutions to nearly all of them. In this thesis we propose our solutions for an image retrieval application, keeping in mind the above expectations.

fishimage

In this thesis we present our system for interactive image retrieval from huge databases. The system has a web-based interface which emphasizes ease of use and encourages interaction. It is modeled on the query-by-example paradigm. It uses an efficient B+-tree based dimensional indexing scheme for retrieving similar ones from millions of images in less than a second. Perception of visual similarity is subjective. Therefore, to be able to serve these varying interpretations the index has to be adaptive. Our system supports user interaction through feedback. Our index is designed to support changing similarity metrics using this feedback and is able to respond in sub-second retrieval times. We have also optimized the basic B+-tree based indexing scheme to achieve better performance when learning is available.

 

 content

 

Content based access to images requires the visual information in them to be abstracted into some numeric features. These features generally represent low level visual characteristics of the data like colors, textures and shapes. They are inherently incapable of weak and cannot represent human perception of visual content, which has evolved over years of evolution. Relevance feedback from the user has been widely accepted as a means to bridge this semantic gap. In this thesis, we propose an inexpensive, feedback driven, feature relevance learning scheme. We estimate iteratively improving relevance weights for the low level numeric features. These weights capture the relevant visual content and are used to tune the similarity metric and iteratively improve retrieval for the active user. We propose to incrementally memorize this learning across users, for the set of relevant images in each query. This helps the system in incrementally converging to the popular content in the images in the database. We also use this learning in the similarity metric to tune retrieval further. Our learning scheme integrates seamlessly with our index making interactive accurate retrieval possible.

Feature based learning improves accuracy; however it is critically dependent on the low level features used. On the other hand, human perception is independent of it. Based on the underlying assumption, that user opinion on the same image remains similar over time and across users, a content free approach has recently become popular. The idea relies on collaborative filtering of user interaction logs for predicting the next set of results for the active user. This method performs better by virtue of being independent of primitive features but suffers from the critical cold start problem. In this thesis we propose a Bayesian inference approach for integrating similarity information from these two complimentary sources. It also overcomes the critical shortcomings of the two paradigms. We pose the problem as that of posterior estimation. The logs provide a priori information in terms of co-occurrence of images. Visual similarity provides the evidence of matching. We efficiently archive and update the co-occurrence relationships facilitating sub-second retrieval.

diversity

Studies have shown that the user refines his query based on the results he is shown. Studies have also shown that quality of retrieval improves with the effectiveness of learning acquired by the system. Presenting the right images to the user for his feedback can thus result in the best retrieval. A set of results which are similar to the query in different ways can help the user narrow down to his intended query at the earliest. This diversity cannot be achieved by similarity retrieval methods. In this thesis, we propose to efficiently use skyline queries for effectively removing conceptual redundancy from the retrieval set. Such a diversely similar set of results is then presented to the user for his feedback. We use our indexing scheme to extract the skyline efficiently, a computationally prohibitive process otherwise. User’s perception changes with the results, so should the nature of the diverse set. We propose the idea of preferential skylines to handle this. We use the user preference, based on feedback, to adapt the retrieval to the user intent. We reduce the diversity and include more similarity from the preferred attributes. Thus, we are slowly able to tune the retrieval to match the user’s exact intent. This provides improved accuracy and in far fewer iterations.

We validate all the ideas proposed in the thesis with experiments on real natural images. We also employ synthetic datasets for other computational experiments. We would like to mention that in spite of the considerable improvements in accuracy with our learning approaches, the effectiveness our solutions is still limited by the features used for encoding the human visual perception. Our methods of learning and other optimizations are only a means to reduce this gap. We present extensive experimental results with discussions validating different aspects of our expectations from the proposed ideas, throughout this thesis. (more...)

 

Year of completion:  June 2009
 Advisor : C. V. Jawahar & Vikram Pudi

Related Publications

  • Pradhee Tandon and C. V. Jawahar - Long Term Learning for Content Extraction in Image Retrieval Proceedings of the 15th National Conference on Communications (NCC 2009), Jan. 16-18, 2009, Guwahati, India. [PDF]

  • Pradhee Tandon, Piyush Nigam, Vikram Pudi, C. V. Jawahar - FISH: A Practical System for Fast Interactive Image Search in Huge Databases Proceedings of 7th ACM International Conference on Image and Video Retrieval (CIVR '08), July 7-9, 2008, Niagara Falls, Canada. [PDF]


Downloads

thesis

ppt

Document Enhancement Using Text Specific Prior


Jyotirmoy Banerjee

Document images are often obtained by digitizing paper documents like books or manuscripts. They could be poor in appearance due to degradation of paper quality, spreading and flaking of ink toner, imaging artifacts etc. All the above phenomena lead to different types of noise at the word level including boundary erosion, dilation, cuts/breaks and merges of characters. Further, with the advent of modern electronic gadgets like PDAs, cellular phones, and digital cameras, the scope of document imaging has widened. Document image analysis systems are becoming increasingly visible in everyday life. For instance, one may be interested in systems that process, store, understand document images obtained by cellular phones. Processing challenges in this class of documents are considerably different from the conventional scanned document images. Many of this new class of documents are characterized by low resolution and poor quality. Super resolution provides an algorithmic solution to the resolution enhancement problem by exploiting the image-specific apriori information. In this thesis we study and propose new methods for restoration and resolution enhancement of document images. We presents a single image super-resolution algorithm for gray level document images without using any training set. Super-resolution of document images is characterized by bimodality, smoothness along the edges as well as subsampling consistency. These characteristics are enforced in a Markov Random Field (MRF) framework by defining an appropriate energy function. In our case, subsampling of super-resolution image will return the original low-resolution one, proving the correctness of the method. The restored image, is generated by iteratively reducing the energy function of the MRF, which is a nonlinear optimization problem. This approach is a single frame approach and is useful when you do not have multiple low-resolution images. Document images have repetitive structural nature as the characters and words are found more than once in a page/book. The extraction of a single high-quality text image from a set of degraded images is benefited from the apriori information. A character segmentation is performed to extract the characters. A total variation based prior model is used in a Maximum A Posteriori (MAP) estimate, to smoothen the edges and preserve the corners, so characteristic of text images. Dependence on character segmentation still remains a bottle-neck. Character segmentation problem is not a completely solved problem. The segmentation accuracy depends on the quantity of noise in the text image. In our next approach, we shall overcome the dependency on character segmentation. We shall look for a restoration approach that does not perform a explicit character segmentation, but still uses the repetitive component nature of document images. In document images degradation is varied at different places in a document. Context plays an important role in textual image understanding. A MRF framework that exploits the contextual relation between image patches, is proposed. Using the topological/spacial constraints between the image patches, the impossible combinations are eliminated from the initial set of matchings, resulting in an unambiguous textual output. The local consistency is adjusted to the global consistency using the belief propagation algorithm. As we are working with patches and not characters, we avoid performing an explicit segmentation. The ability to work with larger patch sizes allows us to deal with severe degradations including cuts, blobs, merges and vandalized documents. This approach can also integrate document restoration and super-resolution into a single framework, thus directly generating high quality images from degraded documents. To conclude, the thesis presents an approach for reconstructing document images. Unlike other conventional reconstruction methods, the unknown pixel values are not estimated based on their local surrounding neighbourhood, but on the whole image. We exploit the multiple occurrence of characters in the scanned document. A great advantage of our proposed approach over conventional approaches is that we have more information at our disposal, which leads to a better enhancement of the document image. Experimental results show significant improvement in image quality on document images collected from various sources including magazines and books, comprehensively demonstrate the robustness and adaptability of the approach.

 

Year of completion:  2009
 Advisor : C. V. Jawahar

Related Publications

  • Jyotirmoy Banerjee, Anoop M. Namboodiri and C.V. Jawahar - Contextual Restoration of Severely Degraded Document Images Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR 09), 20-25 June, 2009, Miami Beach, Florida, USA. [PDF]

  • Jyotirmoy Banerjee, and C.V. Jawahar - Super-resolution of Text Images Using Edge-Directed Tangent Field in Eighth IAPR Workshop on Document Analysis Systems (DAS), Sep 17-19, 2008, Nara, Japan. [PDF] [Received Honorable Mentions AWARD]

  • Jyotirmoy Banerjee, and C. V. Jawahar - Restoration of Document Images Using Bayesian Inference in National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), Jan 11-13, 2008, Gandhinagar, Gujarat, India. [PDF]


Downloads

thesis

ppt

More Articles …

  1. Efficient Image Retrieval Methods For Large Scale Dynamic Image Databases
  2. High Quality Image Reconstruction by Optimal Capture and Accurate Registration
  3. Towards Large Scale Image Based Robot Navigation
  4. Exploring Irregular Memory Access Applications on the GPU
  • Start
  • Prev
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. Thesis
  5. Thesis Students
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.