CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

A New Algorithm for Ray Tracing Synthetic Light Fields on the GPU


Udyan Khurana

Abstract

Ray Tracing is one of the two major approaches to 3D rendering. Images produced through ray tracing show a higher degree of visual realism, and are more accurate compared to other 3D rendering methods like rasterisation. It is primarily used to generate photo-realistic imagery, as it is based on tracing the path of light going into the scene space. Ray tracing was initially an offline algorithm, and was seen as more suitable for applications which required high quality rendering, where the render time constraints were not as strict. Still images, animated films and television series thus used Ray tracing for rendering visual effects like reflections, shadows, refractions etc. accurately. Computer Generated Imagery (CGI) was first used in the science fiction movie Westworld in 1973. In 1995, ’Toy Story’ became the first film to be fully developed using computer animation. RenderMan, a Ray tracing engine developed by Pixar Studios, made this possible. At that time, rendering enough frames to make a 80-minute film took more than 800,000 machine hours. With massive increases in computational power, and the invention of Graphics Processing Units (GPUs), things have considerably improved, leading to renewed interest in ray tracing research. GPUs gave a huge boost to Ray tracing because of it’s inherently parallel nature. Though GPUs accelerate almost all parts of the Ray trac- ing pipeline, it was still considered to be impractical for interactive, real-time rendering applications, like games. With the annoucement of specalised hardware units for Ray tracing in the Turing GPU microarchitecture by NVIDIA, real-time ray tracing is believed to be possible now. Camera technology has advanced a lot over the years. With standard optical cameras, it was impor- tant to focus accurately on the subject, as slight errors would cause a substantial loss of quality in the desired portions of the image. Digital cameras today have the capability to adjust focus and aperture on their own to accurately focus on a subject. While simulating a camera-esque setup using computer graphics, we don’t have the flexibility of a refocusing after rendering is completed. The only way is to render the setup again with the correct setting. The light field representation offers a viable solution to this problem. Light field is a 4D function that captures all the radiance information of a scene. Traditionally, the creation of light fields was done by image-based rendering mechanisms. These reconstruct the 4D space using pre-captured imagery from various views and employ refocusing algorithms to generate output images. Plenoptic cameras capture the information of all light coming towards the camera, and use that to generate images at any focus setting. Handheld plenoptic cameras, like the Lytro camera series also capture light fields using a microlens array between the sensor and the main lens, but physical constraints have limited their spatial and angular resolutions. The first part of this thesis presents a GPU-based synthetic light field rendering framework that is robust and physically accurate. It can create light fields of very high resolutions, which are orders of magnitude higher than currently available in commercial plenoptic cameras. We show how light field rendering is possible and viable in a synthetic rendering setup, and high-quality images at any desired focus and aperture setting can be generated in reasonable times with a very low memory footprint. We explain the theory behind synthetic light field creation, different representations possible for a 4D light field, and show results based on our implementation, as compared with state-of-the-art ray tracing frameworks like POVRay. In the second part of the thesis, we attempt to solve the problem of light field storage. The 4D light field is inherently rich in information, but is very bulky to store. We address this issue by separating the light field creation and output image generation passes of a synthetic light field framework. Our thesis presents a compact representation of the 4D light slabs using a video compression codec and demon- strate different quality-size combinations using this representation. We demonstrate the equivalence of the standard light field camera representation, called the sub-aperture representation, with light slab representation for synthetic light fields. We use this to exhibit the capability of our framework, not only to trace light fields of very high resolutions, but also to store them in memory. The required images can therefore be generated independently from the ray tracing pass with a very small cost

Year of completion:  April 2020
 Advisor : P J Narayanan

Related Publications


    Downloads

    thesis

    Image Representations for Style Retrieval, Recognition and Background Replacement Tasks


    Siddhartha Gairola

    Abstract

    Replacing overexposed or dull skies in outdoor photographs is a desirable photo manipulation. It is often necessary to color correct the foreground after replacement to make it consistent with the new sky. Methods have been proposed to automate the process of sky replacement and color correction. However, many times a color correction is unwanted by the artist or may produce unrealistic results. Style similarity is an important measure for many applications such as style transfer, fashion search, art exploration, etc. However, computational modeling of style is a difficult task owing to its vague and subjective nature. Most methods for style based retrieval use supervised training with pre-defined categorization of images according to style. While this paradigm is suitable for applications where style categories are well-defined and curating large datasets according to such a categorization is feasible, in several other cases such a categorization is either ill-defined or does not exist. In this thesis, we primarily study various image representations and their applications in understanding visual style and automatic background replacement. First, we propose a data-driven approach to sky-replacement that avoids color correction by finding a diverse set of skies that are consistent in color and natural illumination with the query image foreground. Our database consists of ∼1200 natural images spanning many outdoor categories. Given a query image, we retrieve the most consistent images from the database according to L2 similarity in feature space and produce candidate composites. The candidates are re-ranked based on realism and diversity. We used pre-trained CNN features and a rich set of hand-crafted features that encode color statistics, structural layout, and natural illumination statistics, but observed color statistics to be the most effective for this task. We share our findings on feature selection and show qualitative results and a user-study based evaluation to show the effectiveness of the proposed method. Next, we propose an unsupervised protocol for learning a neural embedding of visual style of images. Our protocol for learning style based representations does not leverage categorical labels but a proxy measure for forming triplets of anchor, similar, and dissimilar images. Using these triplets, we learn a compact style embedding that is useful for style-based search and retrieval. The learned embeddings outperform other unsupervised representations for style-based image retrieval task on six datasets that capture different meanings of style. We also show that by fine-tuning the learned features with datasetspecific style labels, we obtain best results for image style recognition task on five of the six datasets. To the best of our knowledge, ours is the first work that provides a comprehensive review and evaluation of style representations in an unsupervised setting. Our findings along with the curated outdoor scene database would be useful to the community for future research in the direction of sky-search and sky-replacement

    Year of completion:  April 2020
     Advisor : P J Narayanan

    Related Publications


      Downloads

      thesis

      Towards developing a multiple modality fusion technique for automatic detection of Glaucoma


      Divya Jyothi Gaddipati

      Abstract

      Glaucoma is a major eye disease which when untreated, can gradually lead to irreversible loss in vision. The underlying causes are a loss of retinal nerve fibres (resulting in a thinning of the layer and enlargement of the optic cup) and peripapillary atrophy. Since these occur without any sign of symptoms in the initial stages, it is difficult to diagnose the disease in the early stages. Hence, the development of Computer-aided diagnostics (CAD) systems for early detection and treatment of the disease has attracted the attention of many medical experts and researchers alike. Optical Coherence Tomography (OCT) and Fundus photography are two widely used retinal imaging techniques for obtaining the structural information of the eye which helps to analyze and detect the diseases. Existing automated systems rely largely on fundus images for assessment of glaucoma due to their fast acquisition and cost. OCT images provide vital and unambiguous information for understanding the changes occurring in the retina, specifically related to the retinal nerve fiber layer and the optic nerve head which are essential for disease assessment. However, the high cost of OCT is a deterrent for deployment in screening at large scale. Hence, the focus of this thesis is to investigate the potential of integrating the two retinal imaging techniques which provide complementary information of the eye for developing automatic glaucoma screening system. Firstly, we propose a deep learning approach directly operating on 3D OCT volumes for glaucoma assessment which showed promising results, thus demonstrating the prominence of the highly discriminative features learnt from OCT for automated glaucoma detection. Next, we present a novel CAD solution wherein both OCT and fundus modality images are leveraged to learn a model that can perform a mapping of fundus to the OCT feature space. We show how this model can be subsequently used to detect glaucoma given an image from only one modality (fundus), thus enabling the automated screening operation to be executed on a large scale. The results show that fundus to OCT feature space mapping is an attractive option for glaucoma detection

      Year of completion:  February 2020
       Advisor : Jayanthi Sivaswamy

      Related Publications


        Downloads

        thesis

        Deep-Learning Features, Graphs and Scene Understanding


        Abhijeet Kumar

        Abstract

        Scene Understanding has been a major aspiration of computer vision from its early days. Its root lies in enabling the computer/robot/machine to understand, interpret and manipulate visual data, in similarity to what an average human eye does in front of a natural/artificial localized location/scene. This ennoblement of the machine have a widespread impact ranging from Surveillance, Aerial Imaging, Autonomous Navigation, Smart Cities and thus scene understanding have remained as an active area of research in the last decade. In the last decade, the scope of problems in the scene understanding community has broadened from Image Annotation, Image Captioning, Image Segmentation to Object Detection, Dense Image Captioning, Instance Segmentation etc. Advanced problems like Autonomous Navigation, Panoptic Segmentation, Video Summarization, Multi-Person Tracking in Crowded Scenes have also surfaced in this arena and are being vigorously attempted. Deep Learning has played a major role in this advancement/development. The performance metrics in some of these tasks have more than tripled in the last decade itself but these tasks remain far from solved. Success originating from deep learning can be attributed to the learned features. In simple words, features learned from a Convolutional Neural Network trained for annotation are in general far more suited for captioning then a non-deep learning method trained for captioning. Taking cue from this particular deep learning trend, we dived into the domain of scene understanding with the focus on utilization of prelearned-features from other similar domains. We focus on two tasks in particular: Automatic (multi-label)Image Annotation and (Road)Intersection Recognition. Automatic image annotation is one of the earliest problems in scene understanding and refers to the task of assigning (multiple) labels to an image based on its content. Whereas intersection recognition is the outcome of the new era of problems in scene understanding and it refers to the task of identifying an intersection from varied viewpoints in varied weather and lighting conditions. We focused on this significantly varied task approach to broaden the scope and generalizing capability of the results we compute. Both image annotation and intersection recognition pose some common challenges such as occlusion, perspective variations, distortions etc. While focusing on the image annotation task we further narrowed our domain by focusing on graph based methods. We again chose two different paradigms: a multiple kernel learning based non-deep learning approach and a deep-learning based approach, with a focus on bringing out contrast again. Through quantitative and qualitative results we show slightly boosted performance from the above mentioned paradigms. The intersection recognition task is relatively new in the field. Most of the work in field focuses on Places Recognition which utilized only single images. We focus on temporal information i.e. the traversal of the intersection/places as seen from a camera mounted on a vehicle. Through experiments we show a performance boost in intersection recognition from the inclusion of temporal information

        Year of completion:  January 2020
         Advisor : Avinash Sharma

        Related Publications


          Downloads

          thesis

          Computational Imaging Techniques to Recover Omni3D Structure & Surface Properties


          Rajat Aggarwal

          Abstract

          The process of imaging converts a 3D world into a 2D image. This process is inherently lossy, making the problem of understanding the world, ill-posed. During the imaging process, a light ray emanating from a source bounces of different surfaces, interacting with them according to the surface properties (albedo, color, specularity) before reaching the camera. The structure that we see in an image is primarily based on the final surface from which the light was reflected, except when that surface is highly specular (mirror-like). In this work, we explore imaging techniques the recover both the structure and surface properties of a scene. For structure recovery, we extend the idea of stereo imaging and present a practical solution to capture a complete 360◦ panorama using a single camera. Current approaches either use a moving camera for capturing multiple images of a scene, which are then stitched together to form the final panorama, or use multiple cameras that are synchronized. A moving camera limits the solution to static scenes, while multi-camera solutions require dedicated calibrated setups. Our approach improves upon the existing solutions in two significant ways: It solves the problem using a single camera, thus minimizing the calibration problem and providing us the ability to convert any digital camera into a stereo panoramic capture device. It captures all the light rays required for stereo panoramas in a single frame using a compact custom designed mirror, thus making the design practical to manufacture and easier to use. We analyze the optimality of the design as well as present panoramic stereo and depth estimation results. The methods for structure recovery, including stereo are often fooled when the surface is highly specular. To alleviate this, we propose an active-illumination based method to detect and segment mirror-like surfaces in a scene. In computer vision, many active illumination techniques employ Projector-Camera systems to extract useful information from the scenes. Known illumination patterns are projected onto the scene and their deformations in the captured images are then analyzed. We observe that the local frequencies in the captured pattern for the mirror-like surfaces is different from the projected pattern. This property allows us to design a custom Projector-Camera system to segment mirror-like surfaces by analyzing the local frequencies in the captured images. The system projects a sinusoidal pattern and capture the images from projector’s point of view. We present segmentation results for the scenes including multiple reflections and inter-reflections from the mirror-like surfaces. The method can further be used in the separation of direct and global components for the mirror-like surfaces by illuminating the non-mirror-like objects separately. We show how our method is also useful for accurate estimation of shape of the non-mirror-like regions in the presence of mirror-like regions in a scene.

          Year of completion:  November 2019
           Advisor : Anoop M Namboodiri

          Related Publications


            Downloads

            thesis

            More Articles …

            1. Driver Attention Monitoring using Facial Features
            2. An investigation of the annotated data sparsity problem in the medical domain
            3. Document Image Quality Assessment
            4. Computational Video Editing and Re-editing
            • Start
            • Prev
            • 15
            • 16
            • 17
            • 18
            • 19
            • 20
            • 21
            • 22
            • 23
            • 24
            • Next
            • End
            1. You are here:  
            2. Home
            3. Research
            4. Thesis
            5. Thesis Students
            Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.