Graphical Object Detection in Document Images

Title: Graphical Object Detection in Document Images
Authors : Ranajit Saha, Ajoy Mondal and C. V. Jawahar

Abstract

Graphical elements: particularly tables and figures contain a visual summary of the most valuable information contained in a document. Therefore, localization of such graphical objects in the document images is the initial step to understand the content of such graphical objects or document images. In this paper, we present a novel end-to-end trainable deep learning based framework to localize graphical objects in the document images called as Graphical Object Detection (GOD). Our framework is data-driven and does not require any heuristics or meta-data to locate graphical objects in the document images. The GOD explores the concept of transfer learning and domain adaptation to handle scarcity of labeled training images for graphical object detection task in the document images. Performance analysis carried out on the various public benchmark data sets: ICDAR-2013, ICDAR-POD2017 and UNLV shows that our model yields promising results as compared to state-of-the-art techniques.

Keywords: Graphical object localization; deep neural network; transfer learning; data-driven.

Graphical Object Detection

graphic

Figure 2: GOD framework. The model takes an image as input and generates the feature map.RPN proposes the region based on the feature map. Detection network uses the proposed regions and feature map to detect various graphical objects


Experiments

Ablation Study
graphic

Table I: Performances of different backbone models on ICDAR-POD 2017 with IoU threshold 0.6. Deeper pre-trained model obtains higher detection accuracy. AP : average precision. Bold value indicates the best result.


graphic

Table II: Greater IoU threshold reduces the performance of the GOD (Faster R-CNN ) model on ICDAR-POD 2017 due to reduction in number of true positives.


Graphical Object Detection in Document Images

Comparison with the state-of-the-arts on ICDAR-POD 2017
graphic

Table III: Comparison with state-of-the-arts based on mAP and Ave F1 with IoU = 0.8 and 0.6, respectively on ICDAR-POD 2017 data set. Our GOD (Mask R-CNN ) is better than state-of-the-arts while IoU = 0.8 and IoU = 0.6. Bold value indicates the best result.


Table Detection in Document Images

Comparison with the state-of-the-arts on ICDAR-2013:
graphic

Table IV: Comparison with state-of-the-arts based on recall, precision and F1 with IoU = 0.5 on ICDAR-2013 data set. Bold value indicates the best result.


Comparison with the state-of-the-arts on UNLV
graphic

Table V: Comparison with state-of-the-arts on UNLV data set based on recall, precision and F1 with IoU = 0.5. Bold value indicates the best result.


Results on GO-IIIT-5K data set

graphic

Table VI: Performance of the trained GOD model on GO-IIIT-5K with IoU = 0.5. GOD: trained on public benchmark ICDAR-2013, ICDAR-POD 2017, UNLV and Marmot data sets and tested on test images of GO-IIIT-5 K data set. GOD†: trained on the public benchmark ICDAR -2013, ICDAR-POD 2017, UNLV and Marmot data sets, then fine-tuned on training images and tested on test images of GO-IIIT-5K data set. ‘*’ indicates that data set is randomly divided into training and test sets. ‘**’ indicates that data set is divided into training and test sets based on company. Bold value indicates the best result.


Qualitative Results

graphic

Figure 3: (a) Results of graphical objects: table, figure and equation localization using the GOD (Mask R-CNN ) on ICDAR-POD 2017 data set. Blue, Green and Red colors represent the predicted bounding boxes of table, figure and equation, respectively. (b) Results of table localization using the GOD (Mask R-CNN ) in the document images of ICDAR-2013 data set. Blue color represents the predicted bounding box of the table.

graphic

Figure 4: (a) Results of the table localization using GOD (Mask R-CNN ) in the document images of UNLV data set. (b) Results of table localization using the GOD (Mask R-CNN ) in the document images of GO-IIIT-5K data set. Blue color represents the predicted bounding box of the table.


Failure Exaple

graphic

Figure 5: Examples where the GOD fails to accurately localize the graphical objects. (a) figure is detected as table. (b) table is partially detected (here, object of interest is table). (c) some equations are not detected. (d) figure is detected as both figure and table. (e) single figure is detected as multiple figures. (f) paragraph is detected as table. Blue, Green and Red colors represent the predicted bounding boxes of table, figure and equation, respectively.


Link to the Download: [ Dataset ] , [ Code ] & [ PDF ]


Publication

  • Ranajit Saha, Ajoy Mondal and C. V. Jawahar , Graphical Object Detection in Document Images , International Conference on Document Analysis and Recognition (ICDAR) 2019.

Cite this paper as:

@inproceedings{saha2019graphical,
title={Graphical Object Detection in Document Images},
author={Saha, Ranajit and Mondal, Ajoy and Jawahar, C V},
booktitle={2019 International Conference on Document Analysis and Recognition (ICDAR)},
pages={51--58},
year={2019},
organization={IEEE}
}


Team

  • Ranajit Saha
  • Dr. Ajoy Mondal
  • Prof. C. V. Jawahar