Computational Video Editing and Re-editing

Moneish Kumar

Abstract

The amelioration in video capture technology has made recording videos very easy. The introduction of smaller affordable cameras which not only boast high-end specifications but are also capable of capturing videos at very high resolution (4K,8K and even 16K) have made recording of high quality videos accessible to everyone. Although this makes recording of videos very straightforward and ef\fortless, a major part of the video production process - Editing is still labor intensive and requires skill and expertise. This thesis takes a step towards automating the editing process and making it less time consuming. In this thesis, (1) we explore a novel approach of automatically editing stage performances such that both the context of the scene as well as close-up details of the actors are shown. (2) We propose a new method to optimally re-target videos to any desired aspect ratio while retaining the salient regions that are derived using gaze tracking. Recordings of stage performances are easy to capture with a high-resolution camera, but are difficult to watch because the actors’ faces are too small. We present an approach to automatically create a split screen video that transforms these recordings to show both the context of the scene as well as close-up details of the actors. Given a static recording of a stage performance and the tracking information about the actors positions, our system generates videos showing a focus+context view based on computed close-up camera motions using crop-and zoom. The key to our approach is to compute these camera motions such that they are cinematically valid close-ups and to ensure that the set of views of the different actors are properly coordinated and presented. We pose the computation of camera motions as convex optimization that creates detailed views and smooth movements, subject to cinematic constraints such as not cutting faces with the edge of the frame. Additional constraints link the close up views of each actor, causing them to merge seamlessly when actors are close. Generated views are placed in a resulting layout that preserves the spatial relationships between actors. This eliminates the need for manual labour and expertise required for both capturing the performance and later editing it, instead the splitscreen of focus+context views allows the viewer to make an active decision on attending to whatever seems important. We also demonstrate our results on a variety of staged theater and dance performances. When videos are captured they are captured according to a specific aspect ratio keeping in mind the size of target screen in which they are meant to be viewed, this results in a inferior viewing experience when they are not watched on screens with their native aspect ratio. We present an approach to auto-matically retarget any given video to any desired aspect ratio while preserving its most salient regions obtained using gaze tracking. Our algorithm performs editing with cut, pan and zoom operations by optimizing the path of a cropping window within the original video while seeking to (i) preserve salient regions, and (ii) adhere to the principles of cinematography. The algorithm has two steps in total. The first step uses dynamic programming to find a cropping window path that maximizes gaze inclusion within the window and also tries to find the location of plausible new cuts (if required). The second step performs regularized convex optimization on the path obtained via dynamic programming to produce a smooth cropping window path comprised of piecewise linear, constant and parabolic segments. We test our re-editing algorithm on a diverse collection of movie and theater sequences. A study conducted with 16 users confirms that our retargeting algorithm results in a superior viewing experience as compared to gaze driven re-editing [30] and letterboxing methods, especially for wide-angle static camera recordings.

Year of completion:	November 2018
Advisor :	Vineet Gandhi