Object-centric Video Navigation and Manipulation

Rajvi Shah (homepage)

The past decade has seen tremendous advancements in consumer electronics and web technology. The proliferation of digital cameras and popularity of content sharing sites have caused a rapid growth in creation and distribution of images and videos by home-users. Enhancement and manipulation of captured images is fairly popular among common users due to availability of numerous easy-to-use photo editing utilities like Instagram, Picasa, Photo Gallery, etc. In comparison, video manipulation is still less popular among common users due to the lack of easy-to-use yet powerful video editing platforms. Basic video editing platforms for home-users are simple and intuitive, but these tools provide limited functionality such as split and merge videos, add captions or audio etc. Professional video editing platforms are rich in functionality, but these tools demand high technical expertise for use. A novice user usually gets discouraged by complex interactions and cumbersome processing. Moreover, the traditional video editing interfaces model and represent videos as a collection of frames against a timeline. In a user's perception, a video has more meaningful semantics such as objects, actions, events, interactions, etc. The gap between perception and representation makes object-centric manipulation of videos an unnatural and laborious task. In this thesis, we attempt to bridge the gap between the power and usability of video manipulation interfaces by using computer vision techniques. We propose a representation based on three high-level video semantics, scene mosaic, object motion, and camera motion to enable simple and meaningful interaction for object-centric navigation and manipulation of long shot videos. We build an extended field of view mosaic of the video scene and represent object motion in this scene mosaic using 3D space-time trajectories. We define novel object and camera manipulation operations using object trajectories as basic interaction elements. The use of object trajectories as basic video semantics replaces complex interface elements by interactive curve manipulation operations. The object operations allow the users to perform various temporal manipulations on the video objects by interactively manipulating the object trajectories. For example, users can delay or advance the video objects by dragging the trajectories along the timeline or replicate objects by creating multiple copies of the object trajectories. The camera operations model the camera as a movable and scalable aperture and allow the users to simulate camera pan, tilt, and zoom by creating new aperture trajectories. Object and camera operations, in combination allow users to perform a number of high-level video manipulations in a simple 'click and drag' fashion.


Year of completion:  2012
 Advisor : P. J. Narayanan

Related Publications

  • Rajvi Shah and P.J. Narayanan - Trajectory based Video Object Manipulation Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2011),11-15 July, 2011, Barcelona, Spain. [PDF]

  • Rajvi Shah and P. J. Narayanan - Scene Background and Object Trajectories based Interactive Video Manipulation, in IEEE Transactions on Circuit Systems and Video Technology, Under Review.