The work in handwriting analysis at CVIT concentrates
on Recognition, Synthesis, Annotation, Search,
and Classification of handwritten data. We primarily concentrate
on online handwriting, where the temporal information of the writing process is
available in the handwritten data, although many of the approaches we use are
extensible to offline handwriting as well. Specifically, recognition of online
handwriting in Indian languages has special significance, as it can form an effective
mechanism of data input, as opposed to keyboards that needs multiple keystrokes
and control sequences to input many characters.
|
Handwriting synthesis is the problem of generating data close to how a human
would write the text. The characterstics of the generated data could be that
of specific writer or that from a generic model. Systhesis of handwriting pose
a challange as writer spcecific features need to captured and preserved, yet
at the same time, variability between handwriting should also be taken into
account. Even with the given model, synthesis should not be deterministic
since the variation that are found in human handwriting are stochastic.
Application of handwriting synthesis, includes, automatically creation of personalized
handwritten documents, large amount of annotated handwritten data for training
of recognition engines and writer independent matching and retrieval of handwritten
documents.
For synthesis of Indic scripts, we model the handwriting at two levels.
A stroke level model is used to capture the writing style and hand movements
of the individual strokes. A space-time layout model is then used to arrange
the synthesized strokes to form the words. Both the stroke model and the layout
model can be learned from examples, and the method can learn from a single
example, as well as a large collection to capture the variations. The model
also allows us to synthesize the words in multiple Indic scripts through
transliteration.
|
|
| Annotation and Search of Handwritten data ::
|
|
|
Annotation of handwriting is the process of labeling input data for training for
the purpose of a variety of handwriting analysis problems like Handwriting recognition
and writer identification systems. However manual annotation of large datasets is tedious,
expensive, and error prone process, especially at character and stroke level. Lack of
proper linguistic resources in form of annotated data sets is a major hurdle in building
recognizers for them.
In many practical situations, plain transcripts of handwritten data is available, which
can be used to make the process of annotation, easier. Data collection process can be carried
out in different setting like, unrestricted data, designed text, dictation, and data generation
(using handwriting synthesis). A parallel text is available in all the above case, except that
of unrestricted data. For annotation, we use the model based handwriting synthesis unit
described above to map the text corpora to handwriting space and annotation is propagated
to word and character levels using elastic matching of handwriting. Stroke
level annotation for online handwriting recognition is currently being done using semi-automatic
tools.
|
| Online Handwriting Recognition :: |
We aim at building robust and accurate recognition engine for Indian languages, specifically
for Hindi, Telugu and Malayalam. There are some features for Indian Languages and their
writing, which necessitates a different approach for recognition as compared to English:
- The primary unit of words is an akshara, which is a combination of
multiple consonants, and ending in a vowel.
- Each language has a very large set of aksharas - usually multiple thousands
- Each akshara is composed of a single or multiple strokes, and no partial strokes.
A robust and accurate recognition system poses a variety of research challenges, especially
for Indian languages. We concentrate on a variety of problems such as building large-class
hierarchical classifiers, specifically for handwriting recognition and OCR, Discriminating
classifiers for differenting similar looking time-series data (strokes), Compact representation
of class models, Efficient spell checkers for languages with large number of work form
variations, etc.
|
|
Writer Identification is a process of identifying the authorship of handwritten documents.
Relevance of document in civil and criminal litigations, primarily dependent on our ability
to assign authorship to the particular document. For more information on writer identification
click here.
- C.V.Jawahar and A. Balasubramanian, Synthesis of Online Handwriting in Indian Languages,
in Proceedings of the International Workshop on Frontiers in Handwriting Recognition(IWFHR'06),
La Baule, France, October 2006.
- Anoop Namboodiri, Sachin Gupta: Text Independent Writer Identification
for online Handwriting, in Proceedings of the International Workshop on
Frontiers in Handwriting Recognition (IWFHR'06), La Baule, France, October
2006.
- Anand Kumar, A. Balasubramanian, Anoop M. Namboodiri and C.V. Jawahar:
Model-Based Annotation of Online Handwritten Datasets, in Proceedings
of the International Workshop on Frontiers in Handwriting Recognition(IWFHR'06),
October 23-26, 2006, La Baule, Centre de Congreee Atlantia, France.
- Karteek Alahari, Satya Lahari Putrevu and C.V. Jawahar: Learning Mixtures
of Offline and Online Features for Handwritten Stroke Recognition, in
Proceedings of the IEEE International Conference on Pattern Recognition(ICPR'06),
Aug 2006, Hong Kong, Vol. III, pp.379-382.
- Karteek Alahari, Satya Lahari P and C. V. Jawahar: Discriminant Substrokes
for Online Handwriting Recognition in Proceedingsof the Eighth International
Conference on Document Analysis and Recognition(ICDAR) , Seoul, Korea 2005, Vol 1,
pp 499-503.
- A. Bhaskarbhatla, S. Madhavanath, M. Pavan Kumar, A. Balasubramanian, and C. V. Jawahar:
Representation and Annotation of Online Handwritten Data in Proceedings of the
International Workshop on Frontiers in Handwriting Recognition(IWFHR), Oct. 2004,Tokyo,
Japan, pp. 136--141.
- Pranav Reddy and C. V. Jawahar: The Role of Online and Offline Features in the
Development of a Handwritten Signature Verification System Proceedings of the
National Conference on Document Analysis and Recognition(NCDAR), Jul. 2001, Mandya,
India, pp. 85--94.
|
A. Balasubramanian |
Kartheek Alahari |
Anand Kumar |
|
Naveen Chandra Tewari |
Sachin Gupta |
Amit Sangroya |
|
Anurag mangal |
Geetika Katragadda |
Haritha Bellam |
|
Anil Gavini |
Anubhaw Srivastava |
Rama Praveen |
|
Anoop Namboodiri |
C. V. Jawahar |
|