CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
  • Research
    • Publications
    • Journals
    • Books
    • MS Thesis
    • PhD Thesis
    • Projects
    • Resources
  • Events
    • Summer School 2026
    • Talks and Visits
    • Major Events
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Past Announcements
  • Contact Us

Real-Time Video Processing for Dynamic Content Creation


Sudheer Achary

Abstract

Autonomous camera systems are vital in capturing dynamic events and creating engaging videos. However, existing filtering techniques used to stabilize and smoothen camera trajectories often fail to replicate the natural behavior of human camera operators. To address these challenges, our work proposes novel approaches for real-time camera trajectory optimization and gaze-guided video editing. We introduce two online filtering methods: CineConvex and CineCNN. CineConvex utilizes a sliding window-based convex optimization formulation, while CineCNN employs a convolutional neural network as an encoder-decoder model. Both methods are motivated by cinematographic principles, producing smooth and natural camera trajectories. Evaluation of basketball and stage performance datasets demonstrates superior performance over previous methods and baselines, both quantitatively and qualitatively. With a minor latency of half a second, CineConvex operates at approximately 250 frames per second (fps), while CineCNN achieves an impressive speed of 1000 fps, making them highly suitable for real-time applications. In the realm of video editing, we present Real Time GAZED, a real-time adaptation of the GAZED framework. It enables users to create professionally edited videos in real-time. Comparative evaluations against baseline methods, including the non-real-time GAZED, demonstrate that Real Time GAZED achieves similar editing results, ensuring high-quality video output. Furthermore, a user study confirms the aesthetic quality of the video edits produced by Real Time GAZED. With the advancements in real-time camera trajectory optimization and video editing presented, the demand for immediate and dynamic content creation in industries such as live broadcasting, sports coverage, news reporting, and social media content creation can be met more efficiently. The elimination of time-consuming post-production processes and the ability to deliver high-quality videos in today’s fast-paced digital landscape are the key advantages offered by these real-time approaches

Year of completion:  November 2023
 Advisor : Vineet Gandhi

Related Publications


    Downloads

    thesis

    Nerve Block Target Localization and Needle Guidance for Autonomous Robotic Ultrasound Guided Regional Anesthesia


    ABHISHEK TYAGI

    Abstract

    Ultrasound guided regional anesthesia (UGRA) involves approaching target nerves through a needle in real-time, enabling precise deposition of drug with increased success rates and fewer complications. Development of autonomous robotic systems capable of administering UGRA is desirable for remote settings and localities where anesthesiologists are unavailable. Real-time segmentation of nerves, needle tip localization and needle trajectory extrapolation are required for developing such a system. In the first part of this thesis, we developed models to localize nerves in the ultrasound domain using a large dataset. Our prospective study enrolled 227 subjects who were systematically scanned for brachial plexus nerves in various settings using three different ultrasound machines to create a dataset of 227 unique videos. In total, 41,000 video frames were annotated by experienced anaesthesiologists using partial automation with object tracking and active contour algorithms. Four baseline neural network models were trained on the dataset and their performance was evaluated for object detection and segmentation tasks. Generalizability of the best suited model was then tested on the datasets constructed from separate ultrasound scanners with and without fine-tuning. The results demonstrate that deep learning models can be leveraged for real time segmentation of brachial plexus in neck ultrasonography videos with high accuracy and reliability. Using these nerve segmentation predictions, we define automated anesthesia needle targets by fitting an ellipse to the nerve contours. The second part of this thesis focuses on localization of the needles and development of a framework to guide the needles toward their targets. For the segmentation of the needle, a natural RGB pre-trained neural network is first fine-tuned on a large ultrasound dataset for domain transfer and then adapted for the needle using a small dataset. The segmented needle’s trajectory angle is calculated using Radon transformation and the trajectory is extrapolated from the needle tip. The intersection of extrapolated trajectory with the needle target guides the needle navigation for drug delivery. The needle trajectory’s average angle error was 2 o , average error in trajectory’s distance from center of the image was 10 pixels (2 mm) and the average error in needle tip was 19 pixels (3.8 mm) which is within acceptable range of 5 mm as per experienced anesthesiologists. The entire dataset has been released publicly for further study by the research community.

    Year of completion:  November 2023
     Advisor : Jayanthi Sivaswamy

    Related Publications


      Downloads

      thesis

      Data exploration, Playing styles, and Gameplay for Cooperative Partially Observable games: Pictionary as a case study


      Kiruthika Kannan

      Abstract

      Cooperative human-human communication becomes challenging when restrictions such as difference in communication modality and limited time are imposed. In this thesis, we present the popular cooperative social game Pictionary as an online multimodal test bed to explore the dynamics of humanhuman interactions in such settings. Pictionary is a multiplayer game where the players attempt to convey a word or phrase through drawing. The restriction imposed on the mode of communication gives rise to intriguing diversity and creativity in the players’ responses. To explore the player activity in Pictionary, an online browser-based Pictionary application is developed and utilized to collect a Pictionary dataset. We conduct an exploratory analysis of the dataset, examining the data across three domains: global session-related statistics, target word-related statistics, and user-related statistics. We also present our interactive dashboard to visualize the analysis results. We identify attributes of player interactions that characterize cooperative gameplay. Using these attributes, we find stable role-specific playing style components independent of game difficulty. In terms of gameplay and the larger context of cooperative partially observable communication, our results suggest that too much interaction or unbalanced interaction negatively impacts game success. Additionally, the playing style components discovered via our analysis align with select player personality types proposed in existing frameworks for multiplayer games. Furthermore, this thesis explores atypical sketch content within the Pictionary dataset. We present various baseline models for detecting such atypical content. We conduct a comparative analysis of three baseline models, namely BiLSTM+CRF, SketchsegNet+, and modified CRAFT. Results indicate that the image segmentation-based deep neural network outperforms recurrent models that rely on stroke features or stroke coordinates as input.

      Year of completion:  November 2023
       Advisor : Ravi Kiran Sarvadevabhatla

      Related Publications


        Downloads

        thesis

        Security from uncertainty: Designing privacy-preserving verification methods using Noise


        Praguna Manvi

        Abstract

        Biometric authentication plays an increasingly prominent role in today’s products and services for verifying an individual’s identity. It is not only efficient but also practical, as it establishes a unique link to an individual through their physical and behavioral characteristics [43]. Unlike conventional authentication mechanisms like passwords or documents, biometric traits are inherent to each individual, eliminating the need to memorize additional information [64]. However, the security and privacy of biometric templates used in authentication remain primary concerns, as biometric data is strongly and irrevocably tied to an individual, as emphasized in the article [42]. In the context of remote authentication, Secure Multiparty Computation (SMC) offers a powerful solution. SMC enables two parties to interactively compute a function using their private inputs without disclosing any information except for the output itself [19]. This approach ensures that biometric template comparison is carried out in a privacy-preserving manner, enhancing both security and privacy in authentication services. In this thesis, we introduce a unique approach to iris, fingerprint, and face verification by incorporating ”noise” into the authentication process. In our work,“noise” refers to signals obtained from non-discriminatory or unreliable regions of biometric characteristics. Our extensive empirical evaluation reveals a correlation among noise features, and we leverage this correlation in a novel Secure Two-Party Computation (STPC) design. This STPC design operates on quantified uncertainty between noise features, providing informationtheoretic security. Our approach has low accuracy degradations, practical computational complexity, wide applicability making it suitable for practical real-time applications.

        Year of completion:  December 2023
         Advisor : Anoop M Namboodiri

        Related Publications


          Downloads

          thesis

          Towards Enhancing Semantic Segmentation in Resource Constrained Settings


          Ashutosh Mishra

          Abstract

          Understanding the semantics of the scene to automate the decision process for self-driving cars completely is becoming a crucial task to solve in computer vision. Due to the recent progress in the state of autonomous driving, added with a lot of semantic segmentation datasets for road scene understanding being proposed, semantic segmentation of road scenes has recently evolved to be an important problem to tackle. But training semantic segmentation models becomes a resource-intensive task since it requires multi-GPU training and therefore becomes the bottleneck to reproducing results for better understanding quickly. This thesis introduces challenges and provides solutions to reduce the training time of segmentation models by introducing two small-scale datasets. Additionally, the thesis explores the potential of employing neural architecture search and automatic pruning techniques to create efficient segmentation modules in resource-constrained settings. Chapter2 of the thesis introduces the problem of semantic segmentation and discusses some deep learning approaches to solve supervised semantic segmentation. We briefly discuss the different metrics used and also touch upon the statistics of various datasets that are available in the literature to train semantic segmentation models. Chapter 3 of the thesis explains the need of having a dataset based on the Indian road scenario. Most of the datasets in the literature are captured in Western settings having well-defined traffic participants, delineated boundaries, etc, which seldom mold in the Indian setting. We describe the annotation pipeline, along with the quality check framework used to annotate the dataset. Now, though the IDD dataset [121] caters to the Indian setting, this dataset is still quite resource intensive in terms of GPU computation. Hence, there is a need to have a small resolution, less label-sized dataset for rapid prototyping. We introduce our proposed datasets and provide a detailed set of experiments, and statistical comparisons with the existing datasets to substantiate our claim regarding the usefulness of the proposed solution. We also show through experiments that the models trained using our datasets can be deployed on low-resource hardware such as Raspberry Pi. At the end of this chapter, we also look into the significance of the proposed datasets in facilitating challenges at two prominent conferences: the International Conference on Computer Vision (ICCV) and the National Conference on Pattern Recognition, Image Processing, and Graphics (NCVPRIPG) in 2019. These challenges aimed to address semantic segmentation in resource-constrained settings, inviting innovative architectures capable of achieving decent accuracy on these proposed datasets. We also discuss the potential application of these datasets in teaching semantic segmentation through a course of notebooks introducing traditional as well as deep learning-based methods to perform segmentation. These notebooks are plug-and-play, where the first three notebooks can run on laptop CPU, while the fourth notebook requires GPU access.

          Year of completion:  January 2024
           Advisor : C V Jawahar,Girish Varma

          Related Publications


            Downloads

            thesis

            More Articles …

            1. High-Quality 3D Fingerprint Generation: Merging Skin Optics, Machine Learning and 3D Reconstruction Techniques
            2. A Holistic Framework for Multimodal Ecosystem of Pictionary
            3. Deploying Multi Camera Multi Player Detection and Tracking Systems in Top View
            4. Towards building controllable Text to Speech systems
            • Start
            • Prev
            • 4
            • 5
            • 6
            • 7
            • 8
            • 9
            • 10
            • 11
            • 12
            • 13
            • Next
            • End
            1. You are here:  
            2. Home
            3. Research
            4. MS Thesis
            5. Thesis Students
            Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.