Multiscale Two-view Stereo using Convolutional Neural Networks for Unrectified Images
Y N Pramod Pramod
Two view stereo problem is a subset of multiview stereo problem where only two views or orientations are available for estimation of depth or disparity. Given the constrained nature of the setup, traditional algorithms assume either the intrinsic or extrinsic parameters to be available in advance in order to build the homographies between the views to rectify the images. Stereo rectification allows epipolar constraint to be enforced such that the corresponding projections of a 3D point could be searched in one dimension along the epipolar lines. When both calibration matrices are not available, the two view stereo problem reduces to estimating the fundamental matrix. A condition number which measures the instability of a function when input conditions change, is high for a fundamental matrix when estimated using an 8 point algorithm. Deep learning methods have been the sought after solutions to numerous computer vision problems as the state of the art research have exposed the power in terms of learning capability of neural networks in general. We explore stereo correspondences in an uncalibrated setting in general by estimating a depthmap given a pair of unrectified stereo images. An end-to-end solution is sought after in a setting where the relative depths of pixels with respect to a single view point could be extracted with the aid of another view. Extending the capabilities of the correlation layer as devised by the flownet architecture, a modified flownet architecure is designed to regress depthmaps with an extension of multiscale correlations for handling textureless surfaces and repetitive textured surfaces. Due to unavailability of dataset for deep learning of unrectified images, a constrained setup of turn table sequences is constructed for this purpose using Google 3D warehouse models.Following the concepts of Attention modelling, we implement an architecture for combining correlations computed at multiple resolutions using a simple element-wise multiplication of the correlations to aid the architecture to resolve correspondences for textureless and repeated textured surfaces. Our experiments show both qualitative and quantitative improvements of depth maps over the original Flownet architecture and provide a solution to the unrectified stereo depth estimation which in literature, most algorithms work on stereo rectified image pairs to compute depthmaps/disparities.
|Year of completion:||May 2020|
|Advisor :||Anoop M Namboodiri|