DocFigure: A Dataset for Scientific Document Figure Classification
Title: DocFigure: A Dataset for Scientific Document Figure Classification
Authors : K V Jobin, Ajoy Mondal and C. V. Jawahar
Abstract
Document figure classification (DFC) is an important stage of a document image understanding system. The design of a DFC system required a well defined figure categories and a dataset. To the best of the author’s knowledge, the existing datasets related to classification of figures in the document images are limited with respect to their size and categories [1]–[3]. In this paper, we introduce a scientific figure classification dataset, named as DocFigure. The dataset consists of 33K annotated figures of 28 different categories present in the document images which correspond to scientific articles published in CVPR, ECCV, ICCV, etc. conferences in the last several years. Manual annotation of such a large number (33K) of figures is time consuming and cost ineffective. In this article, we design a web based annotation tool which can efficiently assign category labels to large number of figures with the minimum efforts of human annotators. To benchmark our generated dataset on classification task, we propose three baseline classification techniques using deep feature, deep texture feature and combination of both. In our analysis, we found that the combination of both deep feature and deep texture feature is more effective for document figure classification task than the individual features.
Keywords: Document figure classification; transfer learning; deep feature; deep texture feature.
Dataset: DocFigure
Fig1: Visual illustration of category wise sample figure images of our DocFigure dataset. The 28 categories corre- spond to (a) Line graph, (b) Natural image, (c) Table, (d) 3D object, (e) Bar plot, (f) Scatter plot, (g) Medical image, (h) Sketch, (i) Geographic map, (j) Flow chart, (k) Heat map, (l) Mask, (m) Block diagram, (n) Venn diagram, (o) Confusion matrix, (p) Histogram, (q) Box plot, (r) Vector plot, (s) Pie chart, (t) Surface plot, (u) Algorithm, (v) Contour plot, (w) Tree diagram, (x) Bubble chart, (y) Polar plot, (z) Area chart, (A) Pareto chart and (B) Radar chart.
Comparison of our DocFigure dataset with existing
Table I: Comparison of our DocFigure dataset with existing Deepchart [2], Figureseer [3] and Revision [1] datasets with respect to category labels and samples. The last column indicate the number of images in each class.
Work-flow for generation of DocFigure dataset
Fig2: Work-flow for generation of DocFigure dataset. Annotation is done using two stages. In stage I, part of the dataset is annotated using incremental learning and annotator. In stage II, remaining part is annotated based on similarity score between the rest of the images and help of annotator. Finally, complete annotated dataset is generated.
Link to the Download: [ Dataset ] * [ Code ] *
Note: * Their use is restricted to non-commercial research and educational purposes.
Baseline
Fig3: Basic framework for the proposed three baseline approaches. Red dotted rectangle corresponds to FC - CCN features extraction block, Blue dotted rectangle indicates FV - CNN features extraction module and Black dotted rectangle corresponds to classification module (best viewed in color).
Results:
Labels | FC-CNN | FV-CNN | FC-CNN + FV-CNN |
---|---|---|---|
3D objects | 98.24% | 94.73% | 98.53% |
Algorithm | 93.81% | 91.75% | 93.81% |
Bar plots | 93.97% | 91.97% | 93.64% |
Box plot | 91.39% | 88.07% | 92.05% |
Flow chart | 92.53% | 91.14% | 97.01% |
Heat map | 99.25% | 95.89% | 99.62% |
Histogram | 94.89% | 88.26% | 94.89% |
Medical images | 97.87% | 92.55% | 98.93% |
Pie chart | 91.66% | 89.81% | 94.44% |
Polar plot | 85.71% | 78.57% | 85.71% |
Area chart | 84.61% | 91.02% | 92.30% |
Block diagram | 97.26% | 97.65% | 98.43% |
Bubble Chart | 80.95% | 91.66% | 90.47% |
Confusion matrix | 85.22% | 89.65% | 93.10% |
Contour plot | 59.34% | 74.72% | 72.52% |
Geographic map | 88.59% | 95.81% | 95.43% |
Line graph * | 98.49% | 98.84% | 99.33% |
Mask | 99.23% | 99.23% | 99.23% |
Natural images | 98.04% | 98.25% | 99.23% |
Pareto charts | 87.17% | 96.15% | 97.43% |
Radar chart | 78.94% | 86.84% | 85.52% |
Scatter plot | 90.14% | 91.19% | 93.66% |
Sketches | 95.65% | 96.37% | 98.18% |
Surface plot | 76.76% | 89.89% | 88.88% |
Tables | 97.25% | 98.73% | 97.67% |
Tree Diagram | 67.04% | 68.18% | 70.45% |
Vector plot | 79.86% | 81.94% | 86.80% |
Venn Diagram | 87.03% | 93.51% | 93.05% |
Average | 88.96% | 90.80% | 92.90% |
Table3: The class wise accuracy of 28 classes in our
proposed dataset DocFigure using shape feature (FC-
CNN), texture feature (FV-CNN) and combination of both
(FC-CNN+FV-CNN). The labels written in italics are more
discriminative in shape feature than texture feature.
Note: * By mistake in the main paper we named Line graph as Graph plots.
Publication
- K V Jobin, Ajoy Mondal and C V Jawahar , DocFigure: A Dataset for Scientific Document Figure Classification , 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), Sydney, NSW, Australia, 2019
Bibtex
@InProceedings{jobin2019,
author = "K V Jobin, Ajoy Mondal and C V Jawahar",
title = "Graphical Object Detection in Document Images",
booktitle = "2019 International Conference on Document Analysis and Recognition Workshops (ICDARW)",
year = "2019"
}