Handwritten Text Retrieval from Unseen Collections
Demo Video
The link for video: Demo Video
The link for video: Demo Video
We propose a novel approach to generate annotations for medical images of several modalities in a semi-automated manner. In contrast to the existing methods, our method can be implemented using any semantic segmentation method for medical images, allows correction of multiple labels at the same time and addition of missing labels.
Semantic segmentation of medical images is an essential first step in computer-aided diagnosis systems for many applications. However, modern deep neural networks (DNNs) have generally shown inconsistent performance for clinical use. This has led researchers to propose interactive image segmentation techniques where the output of a DNN can be interactively corrected by a medical expert to the desired accuracy. However, these techniques often need separate training data with the associated human interactions, and do not generalize to various diseases, and types of medical images. In this paper, we suggest a novel conditional inference technique for deep neural networks which takes the intervention by a medical expert as test time constraints and performs inference conditioned upon these constraints. Our technique is generic can be used for medical images from any modality. Unlike other methods, our approach can correct multiple structures at the same time and add structures missed at initial segmentation. We report an improvement of 13.3, 12.5, 17.8, 10.2, and 12.4 times in terms of user annotation time compared to full human annotation for the nucleus, multiple cell, liver and tumor, organ, and brain segmentation respectively. In comparison to other interactive segmentation techniques, we report a time saving of 2.8 , 3.0, 1.9, 4.4, and 8.6 fold. Our method can be useful to clinicians for diagnosis and, post-surgical follow-up with minimal intervention from the medical expert.
Bhavani Sambaturu*, Ashutosh Gupta, C.V. Jawahar* and Chetan Arora
Semi-Automatic Medical Image Annotation, MICCAI, 2021.
[ PDF ] | [Supplementary] | [BibTeX]
Some additional details have been provided which we were unable to put in the paper due to space constraints.
Our approach has the capability to interactively correct the segmentation of multiple labels at the same time
Our method has the capability to add labels missed at the initial segmentation.
We can perform interactive segmentation of organs for which the pre-trained model was not trained for.
The details of the networks used in our paper has been given here.
The network is based on the DRIU architecture. It is a cascaded architecture where the liver is segmented first followed by the lesion.
A multiple branch network has been proposed which does nuclear instance segmentation and classification at the same time. The horizontal and vertical distances of nuclear pixels between their centers of masses are leveraged to separate the clustered cells.
An autofocus layer for semantic segmentation has been proposed here. The autofocus layer is used to change the size of the receptive fields which is used to obtain features at various scales. The convolutional layers are paralellized with an attention mechanism.
Previous approaches have only attempted to generate sign-language from the text level, we focus on directly converting speech segments into sign-language. Our work opens up several assistive technology applications and can help effectively communicate with people suffering from hearing loss.
We aim to solve the highly challenging task of generating continuous sign language videos solely from speech segments for the first time. Recent efforts in this space have focused on generating such videos from human-annotated text transcripts without considering other modalities. However, replacing speech with sign language proves to be a practical solution while communicating with people suffering from hearing loss. Therefore, we eliminate the need of using text as input and design techniques that work for more natural, continuous, freely uttered speech covering an extensive vocabulary. Since the current datasets are inadequate for generating sign language directly from speech, we collect and release the first Indian sign language dataset comprising speech-level annotations, text transcripts, and the corresponding sign-language videos. Next, we propose a multi-tasking transformer network trained to generate signer's poses from speech segments. With speech-to-text as an auxiliary task and an additional cross-modal discriminator, our model learns to generate continuous sign pose sequences in an end-to-end manner. Extensive experiments and comparisons with other baselines demonstrate the effectiveness of our approach. We also conduct additional ablation studies to analyze the effect of different modules of our network. A demo video containing several results is attached to the supplementary material.
--- COMING SOON ---
--- COMING SOON ---
Santhoshini Gongidi and C. V. Jawahar
[ Teaser Video ] [Paper] [Code]
Overview Handwritten text recognition for Indian languages is not yet a well-studied problem. This is primarily due to the unavailability of large annotated datasets in the associated scripts. We introduce a large-scale handwritten dataset for Indic scripts, referred to as the IIIT-INDIC-HW-WORDS dataset. The dataset consists of 872K handwritten instances written by 135 writers in 8 Indic scripts. With the newly introduced dataset and our earlier datasets IIIT-HW-DEV and IIIT-HW-TELUGU in Devanagari and Telugu respectively, the IIIT-INDIC-HW-WORDS dataset contains annotated hand-written word instances in all 10 prominent Indic scripts.
We further establish a high baseline for text recognition in eight Indic scripts. Our recognition scheme follows the contemporary design principles from other recognition literature, and yields competitive results on English. We further (i) study the reasons for changes in HTR performance across scripts (ii) explore the utility of pre-training for Indic HTRs. We hope our efforts will catalyze research and fuel applications related to handwritten document understanding in Indic scripts.
The zip file for each language contains image folders, label file and a vocabulary file. Please follow the instructions in the README file for more instructions.
Language | Download link |
Bengali | Link |
Gujarati | Link |
Gurumukhi | Link |
Kannada | Link |
Odiya | Link |
Malayalam | Link |
Tamil | Link |
Urdu | Link |
For Devanagari and Telugu datasets, please follow the Link
Santhoshini Gongidi, C V Jawahar, INDIC-HW-WORDS: A Dataset for IndicHandwritten Text Recognition International Conference on Document Analysis and Recognition (ICDAR) 2021, [ PDF ]
For any queries about the dataset, please contact the authors below:
Santhoshini Gongidi:
This work addresses the problem of scene text recognition in India scripts. As a first step, we benchmark scene text recognition for three Indian scripts - Devanagari, Telugu and Malayalam, using a CRNN model. To this end we release a new dataset for the three scripts , comprising nearly a 1000 images in the wild which is suitable for scene text detection and recognition tasks. We train the recognition model using synthetic word images. This data is also made publicly available.
Natural scene images, having text in Indic scripts; Telugu, Malayalam and Devanagari in the clock wise order
Sample word images from the dataset of synthetic word images
If you use the dataset of real scene text images for the three indic scripts or the synthetic dataset comprising font rendered word images, please consider citing our paper titled Benchmarking Scene Text Recognition in Devanagari, Telugu and Malayalam
@INPROCEEDINGS{IL-SCENETEXT_MINESH,
author={M. {Mathew} and M. {Jain} and C. V. {Jawahar}},
booktitle={ICDAR MOCR Workshop},
title={Benchmarking Scene Text Recognition in Devanagari, Telugu and Malayalam},
year={2017}}
IIIT-ILST
IIIT-ILST contains nearly 1000 real images per each script which are annotated for scene text bounding boxes and transcriptions.
Terms and Conditions: All images provided as part of the IIIT-ILST dataset have been collected from freely accessible internet sites. As such, they are copyrighted by their respective authors. The images are hosted in this site, physically located in India. Specifically, the images are provided with the sole purpose to be used for research in developing new models for scene text recognition You are not allowed to redistribute any images in this dataset. By downloading any of the files below you explicitly state that your final purpose is to perform research in the conditions mentioned above and you agree to comply with these terms and conditions.
Download
IIIT-ILST DATASET
Synthetic Word Images Dataset
We rendered millions of word images synthetically to train scene text recognition models for the three scripts.
Errata
The synthetic images dataset provided below is different from the one we used originally in our ICDAR 2017, MOCR workshop paper. The synthetic dataset we used in the orginal work had some images with junk characters resulting from incorrect rendering of certain Unicode points while using some fonts. Hence we re-generated the synthetic word images and the one provided below is this new set.
Download
Syntetic Word Image(Part-I) | Syntetic Word Image(Part-II)
License
The code we used for synthetic word images for Indian languages is added here The repo includes a collection of Unicode fonts for many Indian scripts which you might find useful for works where you want to generate synthetic word images in Indian scripts and Arabic.