Doctoral Dissertations

Epsilon Focus Photography: A Study of Focus, Defocus and Depth-of-field

Parikshit Sakurikar

Abstract

Focus, defocus and depth-of-field are integral aspects of a photograph captured using a wide-aperture camera. Focus and defocus blur provide critical cues for estimation of scene depth and structure which helps in scene understanding or post-capture image manipulation. Focus and defocus blur are also used creatively by photographers to produce remarkable compositional effects such as emphasis on the foreground subject with aesthetic bokeh in the background. Epsilon Focus Photography is a branch of computational photography that deals with the capture and processing of multi-focus imagery - where multiple wide-aperture images are captured with a small change in focus position. In this thesis, we provide a comprehensive study of various problems in epsilon focus photography along with a detailed analysis of the related work in the area. We provide useful constructs for the understanding and manip- ulation of focus, defocus blur and the depth-of-field of an image. The work in this thesis can be divided into four broad categories of measurement, representation, manipulation and applications of focus. Measuring focus is a long studied and challenging problem in computer vision. We study various methods to measure focus and propose a composite measure of focus that combines the strengths of well-known focus measures. We the study the task of post-capture focus manipulation at each pixel in an image and formulate a novel representation of focus that can find much use in image editing toolkits. Our representation can faithfully encode the fine characteristics of a wide-aperture image even at complex interaction locations such as depth-edges and over-saturated background regions, while optimizing the memory footprint of multi-focus imagery. Apart from precise geometric constructs for scene refocusing, we also propose an data-driven approach for post-capture scene refocusing using deep adversarial learning. We show how the tasks of deblurring an image, magnification of the defocused content and overall comprehensive focus manipulation can be efficiently modeled using conditional adversarial networks. We study several applications of focus in computer vision such as view interpolation and depth-from-focus. We provide a tool that can interpolate different views of a scene based on focus texture segmentation and propose a novel solution for depth-from-focus using the proposed composite focus measure. In summary, this thesis consists of a comprehensive study of epsilon focus photography and its applications in the context of computer vision and computational photography.

Year of completion:	August 2021
Advisor :	P J Narayanan

Related Publications

Downloads

Optimization for and by Machine Learning

Pritish Mohapatra

Abstract

In machine learning, tasks like making predictions using a model and learning model parameters can often be formulated as optimization problems. The feasibility of using a machine learning model de- pends on the efficiency with which the corresponding optimization problems can be solved. As such, the area of machine learning throws up many challenges and interesting problems for research in the field of optimization. While in some cases, it is possible to directly apply off-the-shelf optimization methods for problems in machine learning, in many other cases, it becomes necessary to develop optimization algo- rithms that are tailor-made for specific problems. On the other hand, developing optimization algorithms for specific problem domains can itself be helped by machine learning techniques. Learning optimiza- tion algorithms from data can help relieve tedious effort required to develop optimization methods for new problem domains. The challenge here is to appropriately parameterize the space of algorithms for different optimization problems. In this context, we explore the interplay between the areas of optimiza- tion and machine learning and make contributions in specific problems of interest that lie in the overlap of these fields.

Year of completion:	December 2021
Advisor :	C. V. Jawahar

Related Publications

Pritish Mohapatra, C. V. Jawahar and M. Pawan Kumar - Learning to Round for Discrete Labeling Problems, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2018, 09 - 11 April 2018, Playa Blanca, Lanzarote.[PDF]
Pritish Mohapatra, Michal Rolı́nek, C. V. Jawahar, Vladimir Kolmogorov and M. Pawan Kumar - Efficient Optimization for Rank-based Loss Functions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, 18 - 22 June 2018,Salt Lake City, Utah.[PDF]
Pritish Mohapatra, Puneet Kumar Dokania, C.V Jawahar and M. Pawan Kumar - Partial Linearization based Optimization for Multi-class SVM, Proceedings of European Conference on Computer Vision, (ECCV) – Amsterdam, The Netherlands, 2016. [PDF]
Aseem Behl, Pritish Mohapatra, C. V. Jawahar, M. Pawan Kumar - Optimizing Average Precision using Weakly Supervised Data IEEE Transations on Pattern Analysis and Machine Intelligence (TPAMI 2015). [PDF]
Mohak Sukhwani, Suriya Singh, Anirudh Goyal, Aseem Behl, Pritish Mohapatra, Brijendra Kumar Bharti, C.V. Jawahar - Monocular Vision based Road Marking Recognition for Driver Assistance and Safety Proceedings of the IEEE Conference on Vehicular Electronics and Safety,16-17 Dec 2014, Hyderabad, India. [PDF]
Pritish Mohapatra, C.V. Jawahar and M. Pawan Kumar - Efficient Optimization for Average Precision SVM Proceedings of the Neural Information Processing Systems Foundation,08-13 Dec 2014, Qubec, Canada. [PDF]

Downloads

Learning Representations for Word Images

Praveen Krishnan

Abstract

Reading and writing documents is one among the primary skills with which we gather and communicate information. With the emergence of artificial intelligence (AI), researchers are in constant pursuit to build intelligent algorithms that can bring our physical and digital worlds close to each other. One such important domain is document image analysis, where we delve into the problem of understanding content from scanned document image collections. Considering “words” as the basic unit in understanding a document, in this thesis, we address the problem of finding the best possible representation for word images. Representation learning has been a key investigation for an AI problem. The primary goal of this thesis is to learn efficient representations for word images that encode its content. An ideal representation should be invariant to multiple fonts, handwritten styles and less sensitive to noise and distortions. In the past, representations have been handcrafted, specific to modalities (printed, handwritten), and sensitive to the complexities in handwriting in multi-writer scenarios. In this work, we choose the paradigm of learning from data using deep neural networks. We take our inspiration from the fact that given large amounts of annotated data, modern deep neural networks can inherently learn better representations. In this thesis, we also relax the need for large annotated datasets by heavily capitalizing on synthetically generated images. We also introduce a novel problem of learning semantic representation for word images which encodes the semantics of the word and reduces the vocabulary gap that exists between the query and the retrieved results. The first contribution of this thesis is a simple technique to generate large amounts of synthetic data, useful for pre-training deep neural networks. This led to the creation of IIIT-HWS dataset which is now widely used in the document community. The other major contributions of this thesis are: (a) the design of a deep convolutional architecture (named as HWNet) for learning an efficient holistic representation for word images, (b) a joint embedding scheme to project words and textual strings onto a common subspace, and (c) a novel form of word image representation which respects the word form along with its semantic meaning. The learned representations are evaluated under the tasks of word spotting and word recognition. We report state-of-the-art performance on popular datasets under both modern/historical and handwritten/printed document images while keeping the representation size compact in nature. Finally, in order to validate the proposed representations of this thesis, we present some interesting use cases such as (i) finding similarity between a pair of handwritten documents images, (ii) searching for keywords from online lecture videos, and (iii) building word retrieval system for Indic scripts.

Year of completion:	November 2020
Advisor :	C V Jawahar

Related Publications

Siddhant Bansal, Praveen Krishnan and C.V. Jawahar Improving Word Recognition using Multiple Hypotheses and Deep Embeddings The 25th International Conference of Pattern Recognition (ICPR) (ICPR 2021), Milano [PDF]
Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V. Jawahar - Improving CNN-RNN Hybrid Networks for Handwriting Recognition The 16th International Conference on Frontiers in Handwriting Recognition,Niagara Falls, USA [PDF]
Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V. Jawahar - Towards Spotting and Recognition of Handwritten Words in Indic Scripts The 16th International Conference on Frontiers in Handwriting Recognition,Niagara Falls, USA [PDF]
Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V. Jawahar - Localizing and Recognizing Text in Lecture Videos The 16th International Conference on Frontiers in Handwriting Recognition,Niagara Falls, USA [PDF]
Vijay Rowtula, Praveen Krishnan, C.V. Jawahar - POS Tagging and Named Entity Recognition on Handwritten Documents, ICON, 2018[PDF]
Praveen Krishnan, Kartik Dutta and C. V. Jawahar - Word Spotting and Recognition using Deep Embedding, Proceedings of the 13th IAPR International Workshop on Document Analysis Systems, 24-27 April 2018, Vienna, Austria. [PDF]
Kartik Dutta,Praveen Krishnan, Minesh Mathew and C. V. Jawahar - Offline Handwriting Recognition on Devanagari using a new Benchmark Dataset, Proceedings of the 13th IAPR International Workshop on Document Analysis Systems, 24-27 April 2018, Vienna, Austria. [PDF]
Kartik Dutta, Praveen Krishnan, Minesh Mathew, and C. V. Jawahar - Towards Accurate Handwritten Word Recognition for Hindi and Bangla National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2017 [PDF]
Praveen Krishnan and C.V Jawahar - Matching Handwritten Document Images, The 14th European Conference on Computer Vision (ECCV) – Amsterdam, The Netherlands, 2016. [PDF]
Praveen Krishnan, Kartik Dutta and C.V Jawahar - Deep Feature Embedding for Accurate Recognition and Retrieval of Handwritten Text, 15th International Conference on Frontiers in Handwriting Recognition, Shenzhen, China (ICFHR), 2016. [PDF]
Anshuman Majumdar, Praveen Krishnan and C.V. Jawahar - Visual Aesthetic Analysis for Handwritten Document Images,15th International Conference on Frontiers in Handwriting Recognition, Shenzhen, China (ICFHR), 2016. [PDF]
Praveen Krishnan, Naveen Sankaran, Ajeet Kumar Singh and C. V. Jawahar - Towards a Robust OCR System for Indic Scripts Proceedings of the 11th IAPR International Workshop on Document Analysis Systems, 7-10 April 2014, Tours-Loire Valley, France. [PDF]
Praveen Krishnan and C V Jawahar - Bringing Semantics in Word Image Retrieval Proceedings of the 12th International Conference on Document Analysis and Recognition, 25-28 Aug. 2013, Washington DC, USA. [PDF]
Praveen Krishnan, Ravi Sekhar, C V Jawahar - Content Level Access to Digital Library of India Pages Proceedings of the 8th Indian Conference on Vision, Graphics and Image Processing, 16-19 Dec. 2012, Bombay, India. [PDF]

Downloads

Geometry-aware methods for efficient and accurate 3D reconstruction

Rajvi Shah

Abstract

Advancements in 3D sensing and reconstruction has made a huge leap for modeling large-scale environments from monocular images using structure from motion (SfM) and simultaneous localization and mapping (SLAM) algorithms. SfM and SLAM based 3D reconstruction has applications for digital archival and modeling of real-world objects and environments, visual localization for geo-tagging and information retrieval, and mapping and navigation for robotic and autonomous driving applications. In this thesis, we address problems in the area of large-scale structure from motion (SfM) for 3D reconstruction and localization. We introduce new methods for improving efficiency and accuracy of state-of-the-art pipeline for structure from motion. Large-scale SfM pipeline deals with large unorganized collections of images pertaining to a particular geographical site. These image collections are formed by either retrieving relevant images using textual queries from the Internet, or can be captured for the specific purpose of 3D modeling, mapping, and navigation. Internet image collections tend to be more noisy and present more challenges for reconstruction as compared to datasets captured with specific intention to reconstruct. In this thesis, we propose methods that help with organizing these large, unstructured, and noisy images into a structure that is useful for SfM methods, a match-graph (or a view-graph). We first propose a geometry-aware two stage approach for pairwise image matching that is both more efficient and superior in quality of correspondences. We then extend this idea to SfM pipeline and present an iterative multistage framework for coarse to fine 3D reconstruction. Finally, we suggest that a key to solving many of the reconstruction problems is to address the problem of filtering and improving the view-graph in a way that is specific to the underlying problem. To this effect, we propose a unified framework for view-graph selection and show its application to achieve multiple reconstruction objectives.

Year of completion:	December 2020
Advisor :	P J Narayanan

Related Publications

Rajvi Shah, Visesh Chari and P.J. Narayanan - View-graph Selection Framework for SfM European Conference on Computer Vision (ECCV), 2018, Munich, Germany [PDF]
Saumya Rawat, Siddhartha Gairola, Rajvi Shah and P.J. Narayanan - Find Me a Sky: A Data-Driven Method for Color-Consistent Sky Search and Replacement International Conference on Multimedia Modeling 2018: 216-228 [PDF]
Ishit Mehta, Parikshit Sakurikar, Rajvi Shah, P J Narayanan - SynCam: Capturing sub-frame synchronous media using smartphones IEEE International Conference on Multimedia and Expo (ICME-2017 ).[PDF]
Aditya Singh, Saurabh Saini, Rajvi Shah and P. J. Narayanan - Learning to hash-tag videos with Tag2Vec Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing. ACM, 2016. [PDF]
Aditya Singh, Saurabh Saini, Rajvi Shah and P J Narayanan - From Traditional to Modern : Domain Adaptation for Action Classification in Short Social Video Clips 38th German Conference on Pattern Recognition (GCPR 2016) Hannover, Germany, September 12-15 2016. [PDF]
Rajvi Shah, Vanshika Srivastava, P.J. Narayanan - Geometry-aware Feature Matching for Structure from Motion Applications Proceedings of the IEEE Winter Conferenc on Applications of Computer Vision, 06-09 Jan 2015, Waikoloa Beach, USA. [PDF]
Rajvi Shah, Aditya Deshpande, P.J. Narayanan - Multistage SFM: Revisiting Incremental Structure from Motion Proceedings of the International Conference on 3D Vision,08-11 Dec 2014, Tokyo, Japan.[PDF]
Rajvi Shah and P. J. Narayanan - Interactive video manipulation using object trajectories and scene backgrounds IEEE Transactions on Circuits and Systems for Video Technology 23.9 (2013): 1565-1576. [PDF]
Rajvi Shah and P.J. Narayanan - Trajectory based Video Object Manipulation Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2011),11-15 July, 2011, Barcelona, Spain. [PDF]
Rajvi Shah, P. J. Narayanan and Kishore Kothapalli - GPU-Accelerated Genetic Algorithms Proceedings of The 3rd Workshop on Parallel Architectures for Bio-inspired Algorithms(WPABA) in conjunction with Parallel Architectures for Compilation Techniques (PACT'10),11-15 Sep. 2010,Vienna, Austria. [PDF]

Downloads

Anatomical Structure Segmentation in Retinal Images with Some Applications in Disease Detection

Arunava Chakravarty

Abstract

Color Fundus (CF) imaging and Optical Coherence Tomography (OCT) are widely used by ophthalmologists to visualize the retinal surface and the intra-retinal tissue layers respectively. An accurate segmentation of the anatomical structures in these images is necessary to visualize and quantify the structural deformations that characterize retinal diseases such as Glaucoma, Diabetic Macular Edema (DME) and Age-related Macular Degeneration (AMD). In this thesis, we propose different frameworks for the automatic extraction of the boundaries of relevant anatomical structures in CF and OCT images. First, we address the problem of the segmentation of Optic Disc (OD) and Optic Cup (OC) in CF images to aid in the detection of Glaucoma. We propose a novel boundary-based Conditional Random Field (CRF) framework to jointly extract both the OD and OC boundaries in a single optimization step. Although OC is characterized by the relative drop in depth from the OD boundary, the 2D CF images lack explicit depth information. The proposed method estimates depth from CF images in a supervised manner using a coupled, sparse dictionary trained on a set of image-depth map (derived from OCT) pairs. Since our method requires a single CF image per eye during testing it can be employed in the large-scale screening of glaucoma where expensive 3D imaging is unavailable. Next, we consider the task of the intra-retinal tissue layer segmentation in cross-sectional OCT images which is essential to quantify the morphological changes in specific tissue layers caused by AMD and DME. We propose a supervised CRF framework to jointly extract the eight layer boundaries in a single optimization step. In contrast to the existing energy mini-mization based segmentation methods that employ handcrafted energy cost terms, we linearly parameterize the total CRF energy to allow the appearance features for each layer and the relative weights of the shape priors to be learned in a joint, end-to-end manner by employing the Structural Support Vector Machine formulation. The proposed method can aid the oph-thalmologists in the quantitative analysis of structural changes in the retinal tissue layers for clinical practice and large-scale clinical studies. Next, we explore the Level Set based Deformable Models (LDM) which is a popular energy minimization framework for medical image segmentation. We model the LDM as a novel Recurrent Neural Network (RNN) architecture called the Recurrent Active Contour Evolution Network (RACE-net). In contrast to the existing LDMs, RACE-net allows the curve evolution velocities to be learned in an end-to-end manner while minimizing the number of network parameters, computation time and memory requirements. Consistent performance of RACE-net on a diverse set of segmentation tasks such as the extraction of OD and OC in CF images, cell nuclei in histopathological images and left atrium in cardiac MRI volumes demonstrates its utility as a generic, off-the-shelf architecture for biomedical segmentation. Segmentation has many clinical applications especially in the area of computer aided diagnostics. We close this dissertation with some illustrative applications of the segmentation information. We consider the case of disease detection in CF and OCT images. We explore and benchmark two classification strategies for the detection of glaucoma from CF images based on deep learning and handcrafted features respectively. Both the methods use a combination of appearance features directly derived from the CF image and structural features derived from the OD and OC segmentation. We also construct a Normative Atlas for the macular OCT volumes to aid in the detection of AMD. The irregularities in the Bruch’s membrane caused by the deposit of drusen are modeled as deviations from the normal anatomy represented by the Atlas Mean Template

Year of completion:	Novenber 2019
Advisor :	Jayanthi Sivaswamy

Related Publications

Chakravraty A, Gaddipati DJ and Jayanthi Sivaswamy - Construction of a Retinal Atlas for Macular OCT Volumes ICIAR 2018, Portugal [PDF]
Chakravraty A and Jayanthi Sivaswamy - RACE-net: A Recurrent Neural Network for Biomedical Image Segmentation IEEE journal of biomedical and health informatics [PDF]
Arunava Chakravarty and Jayanthi Sivaswamy - Joint optic disc and cup boundary extraction from monocular fundus images Computer Methods and Programs in Biomedicine 147 (2017): 51-61. [PDF]
Chakravarty and Jayanthi Sivaswamy - End-to-End Learning of a Conditional Random Field for Intra-retinal Layer Segmentation in Optical Coherence Tomography Annual Conference on Medical Image Understanding and Analysis. Springer, Cham, 2017. [PDF]
Chakrabarty L, Joshi G.D, Chakravarty A, Raman G.V., Krishnadas S.R. and Sivaswamy J. (2016) - Automated Detection of Glaucoma From Topographic Features of the Optic Nerve Head in Color Fundus Photographs, Journal of Glaucoma, 25(7, pp.590-597. [PDF]
Arunava Chakravarty and Jayanthi Sivaswamy - Glaucoma Classification with a Fusion of Segmentation and Image-based Features Proc. of IEEE International Symposium on Bio-Medical Imaging(ISBI), 2016, 13 - 16 April, 2016, Prague. [PDF]
Ujjwal, Arunava Chakravarty, Jayanthi Sivaswamy - An Assistive Annotation System for Retinal Images Proceedings of the IEEE International Symposium on Biomedical Imaging : From Nano to Macro, 16-19 April 2015.[PDF]
Jayanthi Sivaswamy, S.R. Krishnadas, Arunava Chakravarthy, Gopal Datt Joshi, Ujjwal, Tabish Abbas Syed - A Comprehensive Retinal Image Dataset for the Assessment of Glaucoma from the Optic Nerve Head Analysis JSM Biomed Imaging Data Papers 2(1) : 1004 (BIDP: 2015). [PDF]
Arunava Chakravarthy, Jayanthi Sivaswamy - Coupled Sparse Dictionary for Depth-based Cup Segmentation from Single Color Fundus Image Proceedings of the MICCAI 2014, 14-18 Sep 2014, Boston,USA. [PDF]
M J J P Van Grinsven, Arunava Chakravarty, Jayathi Sivaswamy, T. Theelen, B. Van Ginneken, C I Sanchez - A Bag of Words Approach for Discriminating between Retinal Images Containing Exudates or Drusen Proceedings of the IEEE 10th International Symposium on Biomedical Imaging : From Nano to Macro, 07-11 April. 2013, San Franciso,CA,USA. [PDF]
Ujjwal, K. Sai Deepak, Arunava Chakravarty, Jayathi Sivaswamy - Visual Saliency based Bright Lesion Detection and Discrimination in Retinal Images Proceedings of the IEEE 10th International Symposium on Biomedical Imaging : From Nano to Macro, 07-11 April. 2013, San Franciso,CA,USA. [PDF]
Arunava Chakravarty, Jayanthi Sivaswamy - A Novel Approach for Quantification of Retinal Vessel Tortuosity using Quadratic Polynomial Decomposition Proceedings of the indian Conference on Medical Informatics and Telemedicine, 28-30 Mar. 2013, Kharagpur, INDIA.. [PDF]

Downloads

-->

Epsilon Focus Photography: A Study of Focus, Defocus and Depth-of-field

Parikshit Sakurikar

Abstract

Related Publications

Downloads

Optimization for and by Machine Learning

Pritish Mohapatra

Abstract

Related Publications

Downloads

Learning Representations for Word Images

Praveen Krishnan

Abstract

Related Publications

Downloads

Geometry-aware methods for efficient and accurate 3D reconstruction

Rajvi Shah

Abstract

Related Publications

Downloads

Anatomical Structure Segmentation in Retinal Images with Some Applications in Disease Detection

Arunava Chakravarty

Abstract

Related Publications

Downloads

More Articles …