Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification

MNSSK Pavan Kumar

Pattern recognition is an important area of research in information science with widespread application in image analysis, language processing, data mining, bioinformatics, etc. Most of the research in this area was centered around building efficient classifiers for recognition problems involving two classes of samples. However, most of the real world pattern recognition problems are not simple binary problems; they have multiple classes to recognize. It is only in the recent past that researchers have started to focus on building classifiers for multiple class classification. There are two popular directions for developing multiple class classifiers --- one is to extend the theory of binary classification directly to adopt to the multiple classes, and the other is to build large multiclass systems with efficient binary classifiers as components. The latter paradigm has received much attention and popularity due to its empirical success in various applications such as Optical Character Recognition, Biometrics, Data Mining etc. This Thesis focuses on the classifier combination paradigm to design efficient and accurate classifiers for multiclass recognition of large number of classes.

Decision Directed Acyclic Graph (DDAG) classifiers integrate a set of pairwise classifiers using a graph based architecture. Accuracy of the DDAG can be improved by appropriate design of the individual nodes. An optimal feature selection scheme is employed for improving the performance of the nodes. This feature selection scheme extracts separate set of features for each node, which increases the time taken for a sample to be classified. A novel approach for the computation of the condendsed representation of the large set of features using LDA followed by PCA is proposed to overcome this problem. Nodes with low performance are boosted using a popular boosting algorithm called Adaboost. Improvement in performance is demonstrated on Character Recognition and other datasets.

The arrangement of nodes in a DDAG is shown to affect the classification performance of the overall classifier. Popular DDAG algorithm is very accurate; however provides no more information than the class label. We modify this algorithm to handle a `reject-class' for improving the performance. The design of this DDAG is posed as an optimization problem in terms of misclassification probabilities. The problem being NP-Hard, approximate algorithms are proposed to design an improved DDAG. It is experimentally shown that the proposed algorithm provides a design close to the average case performance of the DDAG.

Although DDAGs are an attractive way to build multiclass classifiers, they suffer from the large number of component classifiers they employ. A Binary Hierarchical Classifier (BHC) is an alternate approach, which uses very few component classifiers unlike a DDAG. We propose a new scheme to divide the set of available classes into overlapping partitions so as to maximize the margin at each node of the BHC, and thereby improving the performance. BHC and DDAGs are the classifiers with complementary advantages. A DDAG has high accuracy but has high storage and classification time complexity. BHC is extremely efficient in storage, with a reasonably high accuracy. We exploit the advantages of both these classifiers by designing a new hybrid classifier. This employs binary partitions at most of the nodes where the classifications are relatively easy, and DDAGs are used for classifying a complex subset of classes wherever appropriate. The hybrid architecture is shown to perform better than the BHC, with a performance close to that of the DDAG, but with a great reduction in the number of nodes required compared to that of a complete DDAG.

This thesis presents a spectrum of classifier design algorithms for improving performance by feature selection, component classifier selection and combination architecture design. The problems are formulated in their generality, analyzed and the proposed solutions are are empirically evaluated on popular datasets. Experimental results demonstrate that these techniques are promising.


Year of completion:  2005
 Advisor :

C. V. Jawahar

Related Publications

  • M. N. S. S. K. Pavan Kumar and C. V. Jawahar, Design of Hierarchical Classifier with Hybrid Architectures, Proceedings of First International Conference on Pattern Recognition and Machine Intelligence(PReMI 2005) Kolkata, India. December 2005, pp 276-279. [PDF]

  • M. N. S. S. K. Pavan Kumar and C. V. Jawahar, Configurable Hybrid Architectures for Character Recognition Applications, Proceedings of Eighth International Conference on Document Analysis and Recognition(ICDAR), Seoul, Korea 2005, Vol 1, pp 1199-1203. [PDF]

  • MNSSK Pavan Kumar and C. V. Jawahar - On Improving Design of Multiclass Classifiers, Proceedings of the International Conference on Advances in Pattern Recognition(ICAPR), Dec. 2003, Calcutta, India, pp. 109--112. [PDF]

  • C. V. Jawahar, MNSSK Pavan Kumar and S. S. Ravikiran - A Bilingual OCR system for Hindi-Telugu Documents and its Applications, Proceedings of the International Conference on Document Analysis and Recognition(ICDAR) Aug. 2003, Edinburgh, Scotland, pp. 408--413. [PDF]