Robust Registration For Video Mosaicing Using Camera Motion Properties
Pulkit Parikh (Home Page)
In recent years, video mosaicing has emerged as an important problem in the domain of computer vision and computer graphics. Video mosaics find applications in many popular arenas including video compression, virtual environments and panoramic photography. Mosaicing is the process of generating a single, large, integrated image by combining the visual clues from multiple images. The composite image called mosaic provides high field of view without compromising the image resolution. When the input images are the frames of a video, the process is called video mosaicing. The process of mosaicing primarily consists of two key components: image registration and image compositing. The focus of this work is on image registration - the process of estimating a transformation that relates the frames of the video. In addition to mosaicing, registration has a wide range of other applications in ortho-rectification, scene transfer, video in-painting, etc. We employ homography, the general 2D transformation, for image registration. Typically, homography estimation is done from a set of matched feature points, extracted from the frames of the input video.
Video mosaicing has been viewed traditionally as a problem of registering (and stitching) only successive video frames. While some of the recent global alignment approaches make use of the information stemming from non-consecutive pairs of video frames, the registration of each frame pair is typically done independently. No real emphasis has been laid on how the imaging process and the camera motion relate the pair-wise homographies. Therefore, accurate registration and in turn, mosaicing, especially, in face of poor feature point correspondence, still remains a challenging task. For example, mosaicing a desert video wherein the frames have minimal texture to provide feature points (e.g. corners) or mosaicing in presence of repetitive texture leading to several mismatched points is highly error-prone.
It is known that the camera that captures the video frames to be mosaiced, does not undergo arbitrary motion. The trajectory of the camera is almost always smooth. Some examples where such motion is seen are aerial videos taken by the camera mounted on an aircraft, videos captured by robots, automated vehicles, etc. We propose an approach which exploits this smoothness constraint to refine outlier homographies i.e., homographies that are detected to be erroneous. In many scenarios, especially in machine-controlled environments, the camera motion, apart from being continuous, also follows a model (say a linear motion model), giving us much more concrete and precise information. In this thesis, we derive relationships between homographies in a video sequence, under many practical scenarios, for various camera motion models. In other words, we translate the camera motion model into a global homography model whose parameters can characterize homographies between every pair of frames. We present a generic methodology which uses this global homography model for accurate registration.
Above mentioned derivations and algorithms have been implemented, verified and tested, on various datasets. The analysis and comparative results demonstrate significant improvement over the existing approaches, in terms of accuracy and robustness. Superior quality mosaics have been developed using our algorithms, in presence of many commonly observed issues like texture-less frames and frames containing repetitive texture, with applicability to indoor (e.g., mosaicing a map) as well as outdoor (e.g., representing an aerial video as a mosaic) settings. For quantitative evaluation of mosaics, we have devised a novel, computationally efficient quality measure. The quantitative results are completely in tune with the visual perception, further testifying the superiority of our approach. In a nutshell, the premise that the properties of the camera motion can serve as an important additional cue for improved video mosaicing, has been empirically validated.
|Year of completion:||2007|