CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

MeronymNet: A Hierarchical Model for Unified and Controllable Multi-Category Object Generation


Rishabh Baghel, Abhishek Trivedi, Tejas Ravichandran, and Ravi Kiran Sarvadevabhatla

[Project Page Link]   [Paper]   [ GitHub]

meronymnet


Overview

  • We introduce MeronymNet, a novel hierarchical approach for controllable, part-based generation of multi-category objects using a single unified model.
  • We adopt a guided coarse-to-fine strategy involving semantically conditioned generation of bounding box layouts, pixel-level part layouts and ultimately, the object depictions themselves.
  • We use Graph Convolutional Networks, Deep Recurrent Networks along with custom-designed Conditional Variational Autoencoders to enable flexible, diverse and category-aware generation of 2-D objects in a controlled manner.
  • The performance scores for generated objects reflect MeronymNet's superior performance compared to multiple strong baselines and ablative variants.
  • We also showcase MeronymNet's suitability for controllable object generation and interactive object editing at various levels of structural and semantic granularity.

Results

meronymnet results 1 Look at sample generations by MeronymNet. For each sample, the generated bounding box, corresponding label mask and the RGB object can be seen. Notice the diversity in number of parts, appearance and viewpoint among the generated objects.


Application Scenario: Interactive Modification

MeronymBot hstack 1 Our model allows users to have control on part level, which they can interact with either using boxes or masks. Notice that the viewpoint for rendering the object has changed from the initial generation to accommodate the updated part list. This scenario especially demonstrates MeronymNet’s holistic, part-based awareness of rendering viewpoints best suited for various part sets.


Dataset

violinparts We use the large-scale part-segmented object dataset, PASCAL Parts. The plot shows the density distribution of part counts in object instances for each category. The varying range and frequency of part occurrences across categories, combined with the requirement of object generation from a single unified model, poses lots of challenges.


Contact

  1. If you have any question, please contact Dr. Ravi Kiran Sarvadevabhatla at - This email address is being protected from spambots. You need JavaScript enabled to view it. .

Automated Tree Generation Using Grammar & Particle System


Aryamaan Jain, Jyoti Sunkara, Ishaan Shah, Avinash Sharma and K S Rajan

 

Abstract

Trees are an integral part of many outdoor scenes and are rendered in a wide variety of computer applications like computer games, movies, simulations, architectural models, AR and VR. This has led to increasing demand for realistic, intuitive, lightweight and easy to produce computer-generated trees. The current approaches at 3D tree generation using a library of trees lack variations in structure and are repetitive. This paper presents an extended grammar-based automated solution for 3D tree generation that can model a wide range of species, both Western and Indian. For the foliage, we adopt a particle system approach that models the leaf, its size, orientation and changes. The proposed solution additionally allows control for individual trees, thus modelling the tree growth variations, changes in foliage across seasons, and leaf structure. This enables the generation of virtual forests with different tree compositions. In addition, a Blender add-on has been developed for use and will be released.


Download

  1. Click here to download paper
  2. Click here to download blender plugin
  3. Click here to download tree generation code
  4. Click here to download grammar files
  5. Click here to download 3D tree dataset

Contact

  1. Aryamaan Jain: This email address is being protected from spambots. You need JavaScript enabled to view it.
  2. Jyoti Sunkara: This email address is being protected from spambots. You need JavaScript enabled to view it.
  3. Ishaan Shah: This email address is being protected from spambots. You need JavaScript enabled to view it.
  4. Avinash Sharma: This email address is being protected from spambots. You need JavaScript enabled to view it.
  5. K S Rajan: This email address is being protected from spambots. You need JavaScript enabled to view it.

Towards Boosting the Accuracy of Non-Latin Scene Text Recognition


Sanjana Gunna, Rohit Saluja, and C.V. Jawahar

[Video]   [Paper]   [ Code]

images

Clockwise from top-left; “Top: Annotated Scene-text images, Bottom: Baselines’ predictions (row-1) and Transfer Learning models’ predictions (row-2)”, from Gujarati, Hindi, Bangla, Tamil, Telugu and Malayalam. Green, red, and “ ” represent correct predictions, errors, and missing characters, respectively. (Color figure online)

Abstract

Scene-text recognition is remarkably better in Latin languages than the non-Latin languages due to several factors like multiple fonts, simplistic vocabulary statistics, updated data generation tools, and writing systems. This work examines the possible reasons for low accuracy by comparing English datasets with non-Latin languages. Several controlled experiments are performed on English, by varying the number of (i) fonts to create the synthetic data and (ii) created word images. We discover that these factors are critical for the scene-text recognition systems. We share 55 additional fonts in Arabic, and 97 new fonts in Devanagari, which we found using a region-wise online search. These fonts were not used in the previous scene text recognition works. The fonts details can be found in the github repo. We apply our learnings to improve the state-of-the-art results of two nonLatin languages, Arabic, and Devanagari and achieve WRR gains for IIIT-ILST and MLT datasets.


Paper

  • Paper
    Towards Boosting the Accuracy of Non-Latin Scene Text Recognition

    Sanjana Gunna, Rohit Saluja, C.V. Jawahar
    Towards Boosting the Accuracy of Non-Latin Scene Text Recognition, International Workshop on Arabic and derived Script Analysis and Recognition (ASAR 2021), 2021.
    [PDF] | [BibTeX]

    @inproceedings{gunnaNonLatin2021,
    title={Towards {B}oosting the {A}ccuracy of {N}on-{L}atin {S}cene {T}ext {R}ecognition,
    author={Sanjana Gunna and Rohit Saluja and C V Jawahar},
    booktitle={2021 International Conference on Document Analysis and Recognition Workshops (ICDARW)},
    year={2021}
    }

Contact

  1. Sanjana Gunna - This email address is being protected from spambots. You need JavaScript enabled to view it.
  2. Rohit Saluja - This email address is being protected from spambots. You need JavaScript enabled to view it.

Transfer Learning for Scene Text Recognition in Indian Languages


Sanjana Gunna, Rohit Saluja, and C.V. Jawahar

[Video]   [Paper]   [ Code]

images

Clockwise from top-left; “Top: Annotated Scene-text images, Bottom: Baselines’ predictions (row-1) and Transfer Learning models’ predictions (row-2)”, from Gujarati, Hindi, Bangla, Tamil, Telugu and Malayalam. Green, red, and “ ” represent correct predictions, errors, and missing characters, respectively. (Color figure online)

Abstract

Scene text recognition in low-resource Indian languages is challenging because of complexities like multiple scripts, fonts, text size, and orientations. In this work, we investigate the power of transfer learning for all the layers of deep scene text recognition networks from English to two common Indian languages. We show that the transfer of English models to simple synthetic datasets of Indian languages is not practical. Instead, we propose to apply transfer learning techniques among Indian languages and study the transfer learning among six Indian languages with varying complexities in fonts and word length statistics. We achieve gains in Word Recognition Rates (WRRs) on IIIT-ILST Hindi, Telugu, and Malayalam datasets in comparison to previous works. On the MLT-19 Hindi and Bangla datasets and our Gujarati and Tamil datasets, we observe similar increase in WRR respectively, over our baseline models
We additionally release a dataset of around 440 scene images containing 500 Gujarati and 2535 Tamil words.


Paper

  • Paper
    Transfer Learning for Scene Text Recognition in Indian Languages

    Sanjana Gunna, Rohit Saluja, C.V. Jawahar
    Transfer Learning for Scene Text Recognition in Indian Languages, ICDAR 2021 WORKSHOP ON CAMERA-BASED DOCUMENT ANALYSIS AND RECOGNITION (CBDAR 2021, 9TH EDITION), 2021.
    [PDF] | [BibTeX]

    @inproceedings{Gunna2021TransferLF,
    title={Transfer Learning for Scene Text Recognition in Indian Languages},
    author={Sanjana Gunna and Rohit Saluja and C. V. Jawahar},
    booktitle={ICDAR Workshops},
    year={2021}
    }

Dataset

The dataset contains nearly 440 real scene images, 500 cropped scene images for Gujarati script and 2535 for Tamil script and are annotated for scene text bounding boxes and transcriptions.

Terms and Conditions: All images provided as part of the dataset have been collected from freely accessible internet sites. As such, they are copyrighted by their respective authors. The images are hosted in this site, physically located in India. Specifically, the images are provided with the sole purpose to be used for research in developing new models for scene text recognition You are not allowed to redistribute any images in this dataset. By downloading any of the files below you explicitly state that your final purpose is to perform research in the conditions mentioned above and you agree to comply with these terms and conditions.

Download

Dataset Link


Contact

  1. Sanjana Gunna - This email address is being protected from spambots. You need JavaScript enabled to view it.
  2. Rohit Saluja - This email address is being protected from spambots. You need JavaScript enabled to view it.

More Parameters? No Thanks!


Zeeshan Khan, Kartheek Akella , Vinay Namboodiri   C.V. Jawahar

CVIT, IIIT-H , CVIT, IIIT-H       Univ. of Bath, CVIT, IIIT-H

[Code]   [Presentation Video]   [ TED Dataset]

images

Illustration of the evolution of model parameters. (a) shows the multilingual parameters in grey. Through 60\% pruning and retraining, we arrive at (b), here white represents the free weights with value=0. The surviving weights in grey will be fixed for the rest of the method. Now, we train the free parameters on the first bilingual pair (L-1) and arrive at (c), which represents the initial parameters of L-1 in orange, and share weights with the previously trained multilingual parameters in grey. Again, with 50\% pruning and retraining on the current L-1 specific weights in orange, we get the final parameters for L-1 shown in (d) and extract the final mask for L-1 in (f). We repeat the same procedure for all the bilingual pairs and extract the masks for each pair.

Abstract

This work studies the long-standing problems of model capacity and negative interference in multilingual neural machine translation (MNMT). We use network pruning techniques and observe that pruning 50-70% of the parameters from a trained MNMT model results only in a 0.29-1.98 drop in the BLEU score. Suggesting that there exist large redundancies even in MNMT models. These observations motivate us to use the redundant parameters and counter the interference problem efficiently. We propose a novel adaptation strategy, where we iteratively prune and retrain the redundant parameters of an MNMT to improve bilingual representations while retaining the multilinguality. Negative interference severely affects high resource languages, and our method alleviates it without any additional adapter modules. Hence, we call it parameter-free adaptation strategy, paving way for the efficient adaptation of MNMT. We demonstrate the effectiveness of our method on a 9 language MNMT trained on TED talks, and report an average improvement of +1.36 BLEU on high resource pairs. Code will be released here


Paper

  • Paper
    More Parameters? No Thanks!

    Zeeshan Khan, Kartheek Akella, Vinay Namboodiri and C.V. Jawahar
    More Parameters? No Thanks! , Findings of ACL, 2021.
    [PDF | [BibTeX]


Official Presentation

Our presentation at Findings of ACL, 2021 is available on : youtube


Contact

  1. Zeeshan Khan - This email address is being protected from spambots. You need JavaScript enabled to view it.
  2. Kartheek Akella - This email address is being protected from spambots. You need JavaScript enabled to view it.

More Articles …

  1. MediTables: A New Dataset and Deep Network for Multi-category Table Localization in Medical Documents
  2. DocVisor: A Multi-purpose Web-based Interactive Visualizer for Document Image Analytics
  3. Syntactically Guided Generative Embeddings for Zero-Shot Skeleton Action Recognition
  4. PALMIRA: A Deep Deformable Network for Instance Segmentation of Dense and Uneven Layouts in Handwritten Manuscripts
  • Start
  • Prev
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. Projects
  5. CVIT Projects
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.