CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

More Parameters? No Thanks!


Zeeshan Khan, Kartheek Akella , Vinay Namboodiri   C.V. Jawahar

CVIT, IIIT-H , CVIT, IIIT-H       Univ. of Bath, CVIT, IIIT-H

[Code]   [Presentation Video]   [ TED Dataset]

images

Illustration of the evolution of model parameters. (a) shows the multilingual parameters in grey. Through 60\% pruning and retraining, we arrive at (b), here white represents the free weights with value=0. The surviving weights in grey will be fixed for the rest of the method. Now, we train the free parameters on the first bilingual pair (L-1) and arrive at (c), which represents the initial parameters of L-1 in orange, and share weights with the previously trained multilingual parameters in grey. Again, with 50\% pruning and retraining on the current L-1 specific weights in orange, we get the final parameters for L-1 shown in (d) and extract the final mask for L-1 in (f). We repeat the same procedure for all the bilingual pairs and extract the masks for each pair.

Abstract

This work studies the long-standing problems of model capacity and negative interference in multilingual neural machine translation (MNMT). We use network pruning techniques and observe that pruning 50-70% of the parameters from a trained MNMT model results only in a 0.29-1.98 drop in the BLEU score. Suggesting that there exist large redundancies even in MNMT models. These observations motivate us to use the redundant parameters and counter the interference problem efficiently. We propose a novel adaptation strategy, where we iteratively prune and retrain the redundant parameters of an MNMT to improve bilingual representations while retaining the multilinguality. Negative interference severely affects high resource languages, and our method alleviates it without any additional adapter modules. Hence, we call it parameter-free adaptation strategy, paving way for the efficient adaptation of MNMT. We demonstrate the effectiveness of our method on a 9 language MNMT trained on TED talks, and report an average improvement of +1.36 BLEU on high resource pairs. Code will be released here


Paper

  • Paper
    More Parameters? No Thanks!

    Zeeshan Khan, Kartheek Akella, Vinay Namboodiri and C.V. Jawahar
    More Parameters? No Thanks! , Findings of ACL, 2021.
    [PDF | [BibTeX]


Official Presentation

Our presentation at Findings of ACL, 2021 is available on : youtube


Contact

  1. Zeeshan Khan - This email address is being protected from spambots. You need JavaScript enabled to view it.
  2. Kartheek Akella - This email address is being protected from spambots. You need JavaScript enabled to view it.

MediTables: A New Dataset and Deep Network for Multi-category Table Localization in Medical Documents


Abstract

We will update soon.
code

DocVisor: A Multi-purpose Web-based Interactive Visualizer for Document Image Analytics


Abstract

We will update soon.
code

Syntactically Guided Generative Embeddings for Zero-Shot Skeleton Action Recognition


Abstract

We introduce SynSE, a novel syntactically guided generative approach for Zero-Shot Learning (ZSL). Our end-to-end approach learns progressively refined generative embedding spaces constrained within and across the involved modalities (visual, language). The inter-modal constraints are defined between action sequence embedding and embeddings of Parts of Speech (PoS) tagged words in the corresponding action description. We deploy SynSE for the task of skeleton-based action sequence recognition. It has been accepted for publication at the 2021 IEEE ICIP.

 

Architecture

ocr

Citation

 
    @misc{gupta2021syntactically,
        title={Syntactically Guided Generative Embeddings for Zero-Shot Skeleton Action Recognition},
        author={Pranay Gupta and Divyanshu Sharma and Ravi Kiran Sarvadevabhatla},
        year={2021},
        eprint={2101.11530},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
  }


PALMIRA: A Deep Deformable Network for Instance Segmentation of Dense and Uneven Layouts in Handwritten Manuscripts


Abstract

Handwritten documents are often characterized by dense and uneven layout. Despite advances, standard deep network based approaches for semantic layout segmentation are not robust to complex deformations seen across semantic regions. This phenomenon is especially pronounced for the low-resource Indic palm-leaf manuscript domain. To address the issue, we first introduce Indiscapes2, a new large-scale diverse dataset of Indic manuscripts with semantic layout annotations. Indiscapes2 contains documents from four different historical collections and is 150% larger than its predecessor, Indiscapes. We also propose a novel deep network Palmira for robust, deformation-aware instance segmentation of regions in handwritten manuscripts. We also report Hausdorff distance and its variants as a boundary-aware performance measure. Our experiments demonstrate that Palmira provides robust layouts, outperforms strong baseline approaches and ablative variants. We also include qualitative results on Arabic, South-East Asian and Hebrew historical manuscripts to showcase the generalization capability of Palmira.

 

Architecture

ocr The PALMIRA architecture. The orange blocks in the backbone are deformable convolutions. A closer look at the DefGrid Mask Head and the Backbone alteration is provided below in the Network Architecture section.

Highlights

We introduce Indiscapes2, a collection of handwritten palm-leaf manuscripts which is 150% larger compared to its predecessor Indiscapes and contains two additional annotated collections which greatly increase qualitative diversity. In addition, we introduce a novel deep learning based layout parsing architecture called Palm leaf Manuscript Region Annotator or Palmira in short. Through our experiments, we show that Palmira outperforms the previous approach and strong baselines, qualitatively and quantitatively. Additionally, we demonstrate the general nature of our approach on out-of-dataset historical manuscripts. Complementing the traditional area-centric measures of Intersection-over-Union (IoU) and mean Average Precision (AP), we report the boundary-centric Hausdorff distance and its variants as part of our evaluation approach.

Network Architecture

1) Deformable Convolutions in the Backbone

Deformable Convolutions provide a way to determine suitable local 2D offsets for the default spatial sampling locations

ocr
1) Deformable Grid Mask Head

The Deformable Grid Mask Head network is optimized to predict the offsets of the grid vertices such that a subset of edges incident on the vertices form a closed contour which aligns with the region boundary.

ocr
ocr

Results

1) Indiscapesv2 Test Set - Document Level

Layout predictions by Palmira on representative test set documents from Indiscapes2 dataset. Note that the colors are used to distinguish region instances. The region category abbreviations are present at corners of the regions.

ocr
2) Indiscapesv2 Test Set - Region Level Performance

A comparative illustration of region-level performance. Palmira’s predictions are in red. Predictions from the best model among baselines (BoundaryPreserving Mask-RCNN) are in green. Ground-truth boundary is depicted in white.

ocr
3) Other Documents (Out of Dataset)

Layout predictions by Palmira on out-of-dataset handwritten Manuscripts

ocr

Citation

 
    @inproceedings{sharan2021palmira,
        title = {PALMIRA: A Deep Deformable Network for Instance Segmentation of Dense and Uneven Layouts in Handwritten Manuscripts},
        author = {Sharan, S P and Aitha, Sowmya and Amandeep, Kumar and Trivedi, Abhishek and Augustine, Aaron and Sarvadevabhatla, Ravi Kiran},
        booktitle = {International Conference on Document Analysis and Recognition,
                   {ICDAR} 2021},
        year = {2021},
    }

Contact

If you have any question, please contact Dr. Ravi Kiran Sarvadevabhatla at This email address is being protected from spambots. You need JavaScript enabled to view it. .


More Articles …

  1. BoundaryNet - An Attentive Deep Network with Fast Marching Distance Maps for Semi-automatic Layout Annotation
  2. Wisdom of (Binned) Crowds: A Bayesian Stratification Paradigm for Crowd Counting
  3. Handwritten Text Retrieval from Unseen Collections
  4. Semi-Automatic Medical Image Annotation
  • Start
  • Prev
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. Projects
  5. CVIT Projects
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.