CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Representation of Ballistic Strokes of Handwriting for Recognition and Verification.


Prabhu Teja S (homepage)

The primary computing device interfaces are moving away from the traditional keyboard and mouse inputs towards touch and stylus based interactions. To improve their effectiveness, the interfaces for such devices should be made robust, efficient, and intuitive. One of the most natural ways of communication for humans has been through handwriting. Handwriting based interfaces are more practical than keyboards, especially for scripts of Indic and Han family, which have a large number of symbols. Pen based computing serves four functionalities: a. pointing input b. handwriting recognition c. direct manipulation d. gesture recognition. The second and fourth problems fall under the broad umbrella of pattern recognition problems. In this thesis we focus on efficient representations for handwriting.

In the the first part of this thesis we propose a representation for online handwriting based on ballistic strokes. We first propose a technique to segment online handwriting into its constituent ballistic strokes based on the curvature profile. We argue that the proposed method of segmentation is more robust to noise, compared to the traditional speed profile minima based segmentation. We, then, propose to represent the segmented strokes as the arc of a circle. This representation if validated by the Sigma-lognormal theory of handwriting generation. These features are encoded using a bag-of-words representation, which we name bag-of-strokes. This representation is shown to achieve state- of-art accuracies on various datasets.

In the second part, we extend this representation to the problem of signature verification. To define a verification system, a similarity metric is to be defined. We propose a metric learning algorithm based on the Support Vector Machine (SVM) hyperplanes learned to separate the training data. This results in a very simple metric learning strategy that capable of being modified to increase the number of users registered by the verification system. We experiment with this technique on the publicly available SVC-2004 database and show that this method results in accuracies applicable to practical scenarios. (more...)

 

Year of completion:  July 2015
 Advisor :

Anoop M. Namboodiri


Related Publications

  • Prabhu Teja S. and Anoop M Namboodiri - A Ballistic Stroke Representation of Online Handwriting for Recognition Proceedings of the 12th International Conference on Document Analysis and Recognition, 25-28 Aug. 2013, Washington DC, USA. [PDF]


Downloads

thesis

 ppt

Optimizing Average Precision using Weakly Supervised Data.


Aseem Behl (homepage)

Many tasks in computer vision, such as action classification and object detection, require us to rank a set of samples according to their relevance to a particular visual category. The performance of such tasks is often measured in terms of the average precision (AP). Yet it is common practice to employ the support vector machine (SVM) classifier, which optimizes a surrogate 0-1 loss. The popularity of SVM can be attributed to its empirical performance. Specifically, in fully supervised settings, svm tends to provide similar accuracy to AP-SVM, which directly optimizes an AP-based loss. However, we hypothesize that in the significantly more challenging and practically useful setting of weakly supervised learning, it becomes crucial to optimize the right accuracy measure. In order to test this hypothesis, we propose a novel latent AP-SVM that minimizes a carefully designed upper bound on the AP-based loss function over weakly supervised samples. Using publicly available datasets, we demonstrate the advantage of our approach over standard loss-based learning frameworks on three challenging problems: action classification, character recognition and object detection. (more...)

 

Year of completion:  July 2015
 Advisor :

Prof. C.V. Jawahar and Dr. M. Pawan Kumar


Related Publications

  • Aseem Behl, Pritish Mohapatra, C. V. Jawahar, M. Pawan Kumar - Optimizing Average Precision using Weakly Supervised Data IEEE Transations on Pattern Analysis and Machine Intelligence (TPAMI 2015). [PDF]

  • Puneet K. Dokania, Aseem Behl, C.V. Jawahar, M. Pawan Kumar -  Learning to Rank using High-Order Information Proceedings of European Conference on Computer Vision,06-12 Sep 2014, Zurich. [PDF]

  • Aseem Behl, C.V. Jawahar and M. Pawan Kumar - Optimizing Average Precision using Weakly Supervised Data Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 23-28 June 2014, Columbus, Ohio, USA. [PDF]


Downloads

thesis

 ppt

Two GPU Algorithms for Raytracing


Srinath Ravichandran (homepage)

Raytracing has been primarily used for generating photo-realistic imagery. Once considered as an offline algorithm, raytracing has primarily become realtime or near realtime with the advancement in computer hardware power. GPU computing has enabled the general purpose usage of graphics cards which were primarily designed for for graphics rendering purposes. Raytracing could be considered as a very embarrassingly parallel process in which each pixel's contribution to the final image is computed independent of one another. The tremendous power of GPUs has been been utilized for vastly improving the performance of almost all the functions in a raytracing pipeline.

The raytracing pipeline for rendering images fundamentally consists of three phases. The first is an acceleration structure construction phase which involves creating a data structure. It provides very fast results for intersection queries for the rays traced within the scene and all the objects contained within the scene. The second is the traversal phase in which each ray is traced within the scene employing the aforementioned data structure. The final optional phase is shading in which each hit point is shaded based on a shading algorithm. The shading algorithm determines the color of the pixel of the image for which a particular ray was traversed and the shading performed for all the rays for all the pixels computes the final rendered image. In this thesis we examine two problems associated with the GPU raytracing pipeline and provide two new approaches for them.

In the first work, we develop a novel parallel algorithm for performing raytracing without constructing any explicit acceleration structure on the GPU. This decouples the long standing notion of creating and storing a separate acceleration structure followed by tracing rays through the structure. Our algorithm creates an implicit hierarchical acceleration structure and traverses the implicit structure at each phase of the algorithm in tandem. Parallel construction algorithms for acceleration structures are difficult to implement on the GPU and also have a large memory footprint to store the resulting structure. Compared to CPUs the amount of memory available to GPUs is limited and hence methods that work on very small memory footprints are of utmost importance. Our algorithm is conceptually very simple, utilizing efficient parallel GPU primitives such as sort and reduce. Further our algorithm has small memory requirements compared to methods that construct acceleration structures thereby making it a suitable candidate for the GPU. Since our method employs a traverse-while-construct method, it is particularly very useful for animated scenes in which the acceleration structure has to be created for each frame in a traditional raytracing pipeline. We implement our algorithm on a CUDA enabled machine and show that our algorithm can perform much better than the serial CPU version of the algorithm.

Our second work is targeted towards the problem in the shading phase of the raytracing pipeline. Traditional renderers employed in the production rendering are primary unidirectional path tracers which traces rays from the camera into the scene. However the path tracing algorithm has difficulty in rendering scenes with complicated lighting scenarios which provide very good aesthetic value to certain kinds of scenes. Bidirectional path tracing on the other hand traces rays from both the camera and the lights within the scene and hence able to render scenes with complicated lighting scenarios much more effectively than path tracing. Current production renderers are primarily CPU based and only a few have started to employ GPUs for the entire pipeline. Limited memory compared to CPUs was the original factor for not employing GPUs. However with current generation GPUs having much more memory than earlier generations, GPUs have been slowly adopted for path tracing based pipelines. However bidirectional path tracers that are GPU based are yet to be fully employed in a full production rendering scenario. Our work is aimed providing a solution that tries to bridge the gap that enables bidirectional path tracing to be used in the production rendering scenario which involves complex materials. We specifically provide a new connection mechanism employed in the light vertex cache - bidirectional path tracing algorithm (LVC-BDPT) for improving shading efficiency as well as a sort based pipeline for improving the runtime performance of the algorithm on the GPU. We show that we perform much better than the unoptimized version of the algorithm as well as providing much better quality for the same amount computations performed.

 

Year of completion:  July 2015
 Advisor :

P. J. Narayanan


Related Publications

  • Srinath Ravichandran and P. J. Narayanan - Parallel Divide and Conquer Ray Tracing In SIGGRAPH ASIA 2013 Technical Briefs, 19-22nd Nov. 2013, Hong Kong. [PDF]

  • Srinath Ravichandran and P. J. Narayanan - Coherent and Importance Sampled LVC-BDPT (Submitted to Siggraph Asia 2015 - Technical Briefs, Under Review).

Downloads

thesis

 ppt

 

RemoteVIS: A Remote Rendering System


Soumyajit Deb (homepage)

Large graphics models require a large amount of memory to store and high end graphics capabilities to render. It is useful to store such virtual models on a server and stream it to a remote client on demand for visualization. In many dynamic environments, this is the only option available. We present the design and implementation of RemoteVIS, a system to perform . The goal of our system is to provide the best visualization quality given the capabilities and connection parameters of the client. The system is designed to adapt to a wide range of clients and connection speeds. The system uses a suitable streaming representation based on the situation. This could range from geometry-based representation combining levels of detail and visibility culling to an image-based representation for clients with low capability. The client’s graphics and storage capabilities, the speed of viewer motion, and the network bandwidth and latency are taken into account for this. The client acquires model data sufficient for an interval in the future using path prediction for freeze and jerk free rendering. In a remote rendered walk through, the virtual environment including all models, textures and other data are stored on the server side. The client only receives the required parts of the virtual environment that the client is viewing without having to download the entire virtual environment. One of the major bottlenecks of the system is network bandwidth between the server and the client. The system optimizes the rendering quality and mesh detail of the world based upon this bandwidth. The geometry based renderer uses Object Based Visible Space Model Representation algorithms for selecting visible geometry to be transmitted to the client. This automatically culls out all hidden and invisible objects in the scene. On the client side, View Frustum Culling is used to render only the visible polygons. The client side code includes path based prediction of motion for a smooth jerk-free walk through. The image based renderer is useful in cases of extremely low bandwidth and/or when the client does not possess a hardware graphics accelerator. Our current implementation uses Visible Space Models as the geometry-based streaming format and trifocal tensors as the image-based streaming format. We present results of the study conducted on a representative range of the relevant parameters.

 

Year of completion:  2008
 Advisor :

P. J. Narayanan


Related Publications


    Downloads

    thesis

     ppt

     

    Motion in Multiple Views


     Sujit Kuthirummal

     

    Mathematically, an image is the projection of the 3D world onto the 2D plane of the camera. This projection results in the loss of information present in the third dimension, popularly referred to as the depth or the z dimension. It is easy to see that a plurality of projections can compensate for this loss and this has led to the study of the geometry that underlies multiple views of the same scene.

    Multiview analysis of scenes is an active area in Computer Vision today. The structure of points and lines as seen in two views attracted the attention of computer vision researchers like Longuet-Higgins and Oliver Faugeras in the eighties and early nineties. Similar studies on the underlying constraints in three or more views followed. The mathematical structure underlying multiple views has been analysed with respect to projective, affine, and euclidean frameworks of the world with amazing results. These multiview relations have been used for visual recognition of 3D objects under changing view positions, object tracking by means of image stabilization processing, view synthesis of 3D objects from 2D views of the same without recovering their 3D structure and many other applications.

    Multiple view situations in Computer Vision have been analyzed with two objectives: to derive scene-independent constraints relating multiple views and to derive view-independent constraints relating multiple scene points. While the first approach seeks to model the configuration of cameras, the second attempts to characterize the configuration of points in the 3D world. The focus of our work is related to the second approach -- to derive constraints on the configuration of the points being imaged in a manner that does not depend on the viewpoint or imaging parameters. Non-rigid motion is difficult to analyze in this scheme. The case of multiple objects moving with different velocities or accelerations can be considered very close to the case of non-rigid motion. We have developed constraints on the projections of such point configurations which can be categorized into two classes: constraints that are time-dependent -- which are functions of time, and constraints that are time-independent -- which hold at every time instant.

    Bennet and Hoffman showed that polynomials to characterize a configuration of stationary points in a view-independent manner can be constructed from 2 views of 4 points under orthographic projection. This was extended by Carlsson to the case of scaled orthographic projection using 2 views of 5 points. Shashua and Levin generalized this to the case of an affine projection model for a time-dependent view-independent constraint on the projections of 5 points moving with different but constant velocities.

    We show that the projection of a point moving with constant velocity in the world moves with constant velocity in the image when the camera is affine. This result is used to formulate a view and time-independent constraint on the velocities of the projections of 4 points whose computation needs 2 views. This is a significant theoretical contribution as it needs fewer points and accommodates a more general imaging model. In the same manner, points moving with constant acceleration in the world move with constant acceleration in the image as well. We derive time-dependent and time-independent constraints on respectively the velocities and accelerations of the projections of 4 points. These view-independent constraints can be used to recognize a configuration of 4 moving points and also to align frames of synchronized videos.

    Though, points moving with independent uniform velocities or accelerations model many non-rigid motion conditions, it is desirable to have a technique that accommodates general non-linear motion. We make the observation that as a point moves in the world it traces out a contour in the world and the trajectory traced out by its projection in an image would be the projection of the world contour. Thus, the contours traced out in different views would correspond. So the problem of analysing the non-rigid motion of a point can be transformed to the problem of analysing the contour traced out by its projections in various views. When the non-rigid motion in the world is restricted to a plane, that is when the motion is planar, the contours traced out in the views can be thought to be projections of a planar shape, the shape being the trajectory in the world. We discuss how planar shape recognition techniques can be used to recognize and analyse the contours.

    We then combined the motion constraints with properties of a contour to devise a mechanism for recognizing a deforming contour, points on whose boundary move with independent uniform linear velocity or acceleration. Given two views of the deforming contour in a reference view, we can now recognize the same contour in any view at any time instant. We have also derived novel view-dependent parameterizations of the motion of the projections of points moving with uniform linear motion in the world, which enable us to arrive at view-independent constraints on the projections of points in motion that are simpler than the ones reported in literature.

     

    Year of completion:  2003
     Advisor :

    C. V. Jawahar & P. J. Narayanan


    Related Publications

    • Sujit Kuthirummal, C. V. Jawahar and P. J. Narayanan - Constraints on Coplanar Moving Points, Proceedings of the European Conference on Computer Vision (ECCV) LNCS 3024, May 2004, Prague, Czech Republic, pp. 168--179. [PDF]

    • Sujit Kuthirummal, C. V. Jawahar and P. J. Narayanan - Fourier Domain Representation of Planar Curves for Recognition in Multiple Views, Pattern Recognition, Vol. 37, No. 4, April 2004, pp. 739--754. [PDF]

    • Sujit Kuthirummal, C.V. Jawahar and P.J. Narayanan - Algebraic Constraints on Moving Points in Multiple Views, Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing(ICVGIP), Dec. 2002, Ahmedabad, India, pp. 311--316. [PDF]

    • Sujit Kuthirummal, C.V. Jawahar and P.J. Narayanan - Multiview Constraints for Recognition of Planar Curves in Fourier Domain, Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing(ICVGIP), Dec. 2002, Ahmedabad, India, pp. 323--328. [PDF]

    • Sujit Kuthirummal, C. V. Jawahar and P. J. Narayanan, Video Frame Alignment in Multiple Views, Proceedings of the International Conference on Image Processing(ICIP), Sep. 2002, Rochester, NY, Vol. 3, pp. 357--360. [PDF]

    • Sujit Kuthirummal, C. V. Jawahar and P. J. Narayanan, Planar Shape Recognition across Multiple Views, Proceedings of the International Conference on Pattern Recognition(ICPR), Aug. 2002, Quebec City, Canada, pp. 482--488. [PDF]

    • Sujit Kuthirummal, C. V. Jawahar and P.J. Narayanan - Frame Alignment using Multi View Constraints, Proceedings of the National Conference on Communications (NCC), Jan. 2002, Mumbai, India, pp. 523--527. [PDF]


    Downloads

    thesis

     ppt

     

     

     

    More Articles …

    1. A Probabilistic Learning Approach for Modelling Human Activities
    2. Depth Image Representation for Image Based Rendering
    3. Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification
    4. Modelling and Recognition of Dynamic Events in Video
    • Start
    • Prev
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • Next
    • End
    1. You are here:  
    2. Home
    3. Research
    4. Thesis
    5. Thesis Students
    Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.