Fast and Accurate Image Recognition

Sri Aurobindo Munagala


DNNs (Deep Neural Networks) have found use in a variety of applications recently, and have become much larger and resource-hungry over the years. CNNs (Convolutional Neural Networks) are widely used in Computer Vision tasks, which make use of the convolution operation in successive layers – making them computationally expensive to run. Due to the large size and computation required to run modern models, it is difficult to use them in resource-constrained scenarios, such as mobiles and embedded devices. Also, the amount of data created and collected is growing at a rapid rate, and annotating large amounts of data to use it for training DNNs is expensive. There are approaches which seek to intelligently query samples to train upon in an iterative fashion, such as Active Learning – but AL setups are themselves very costly to run due to full training of large models many times in the process. In this thesis, we explore methods to achieve extremely high speedups in both CNN model inference, and train-times for AL setups. Various paradigms to achieve fast CNN inference have been explored in the past, two major ones being binarization and pruning. Binarization involves the quantization of weights and/or inputs of the network from 32-bit full precision floats into a {-1,+1} space, with the aim of both achieving compression (as singular bits occupy 32 times less space than 32-bit floats) and speedups (as bit-bit operations can be done faster). Network pruning, on the other hand, tries to identify and remove redundant parts of the network in an unstructured (individual weights) or structured (channels/layers) manner to create sparse and efficient networks. While both these paradigms – binarization and pruning – have demonstrated great efficacy in achieving speedups and compression in CNNs, little work has been done in attempting to combine both approaches together. We argue that these paradigms are complementary, and can be combined to offer high levels of compression and speedup without any significant accuracy loss. Intuitively, weights/activations closer to zero have higher binarization error making them good candidates for pruning. We propose a novel Structured Ternary-Quantized Network that incorporates speedups from binary convolution algorithms through structured pruning, enabling the removal of pruned parts of the network entirely post-training. Our approach beats previous works attempting the same by a significant margin. Overall, our method brings up to 89x layer-wise compression over the corresponding full-precision networks – achieving only 0.33% loss on CIFAR-10 with ResNet-18 with a 40% PFR (Prune Factor Ratio for filters), and 0.3% on ImageNet with ResNet-18 with a 19% PFR. We also explore the field of AL (Active Learning), which is used in scenarios where unlabelled data is abundant, but annotation costs for that data make it infeasible to utilize all data for supervised training. AL methods initially train a model with a small pool of annotated data, and use the model’s (referred to as the selection model) predictions on the rest of the unlabelled data to form a query for annotating more samples. The training-query process iteratively happens till a sufficient amount of data is labelled. However, in practice, this sample selection practice in AL setups takes a lot of time as the selection model is fully re-trained on the labelled pool every iteration. We offer two improvements to the standard AL setup to bring down the overall train time required for sample selection significantly: the first, the introduction of a “memory” that enables us to train the selection model for fewer samples every round as opposed to the entire labelled dataset so far, and the second, the use of fast convergence techniques to reduce the number of epochs the selection model trains for. Our proposed improvements can work in tandem with previous techniques such as the use of proxy models for selection, and the combined improvements can bring more than 75x speedups in overall sample selection time to standard AL setups, making them feasible and easy to run in real-life scenarios where procuring large amounts of data is easy, but labelling them is difficult.

Year of completion:  January 2022
 Advisor : Anoop M Namboodiri

Related Publications