Modelling and Recognition of Dynamic Events in Video

Karteek Alahari

Computer Vision algorithms, which mainly focussed on analyzing image data till the early 1980's, have now matured to handle video data more efficiently. In the past, computational barriers have limited the complexity of video processing applications. As a consequence, most systems were either too slow to be practical, or succeeded by restricting themselves to very controlled situations. With the availability of faster computing resources over the past couple of decades, video processing applications have gained popularity in the computer vision research community. Moreover, the advances in data capturing, storage, and communication technologies have made vast amounts of video data available to consumer and enterprise applications. This has naturally created a demand for video analysis research.

Video sequences typically consist of long-temporal objects - called events - which usually extend over tens or hundreds of frames. They provide useful cues for analysis of video information, including, eventbased video indexing, browsing, retrieval, clustering, segmentation, recognition, summarization, etc. The state-of-the-art techniques seldom use the event information inherent in videos for all these problems. They either simply recognize the events or use primitive features to address other video analysis issues. Furthermore, due to the large volume of video data we need efficient models to capture the essential content in the events. This involves removing the acceptable statistical variability across all the videos. These requirements create the need for learning-based approaches for video analysis.

In this thesis, we aim to address the video analysis problems by modelling and recognizing the dynamic events in them. We propose a model to learn efficient representation of events for analyzing continuous video sequences and demonstrate its applicability for summarizing them. Further, we observe that all parts of a video sequence may not be equally important for the classification task. Based on the characteristics of each part we compute its potential in influencing the decision criterion. Another observation we make is that, a feature set appropriate for one event may be completely irrelevant for another. Hence, an adaptive feature selection scheme is essential. We present an approach to learn an optimal combination of spatial and temporal based on the events being analyzed. Finally, we describe some of our work on unsupervised framework for video analysis.


Year of completion:  2006
 Advisor :

C. V. Jawahar & P. J. Narayanan

Related Publications

  • Karteek Alahari, Satya Lahari Putrevu and C.V. Jawahar - Learning Mixtures of Offline and Online Features for Handwritten Stroke Recognition, Proc. 18th IEEE International Conference on Pattern Recognition(ICPR'06), Hong Kong, Aug 2006, Vol. III, pp.379-382. [PDF]

  • Karteek Alahari and C.V. Jawahar - Dynamic Events as Mixtures of Spatial and Temporal Features, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.540-551, 2006. [PDF]

  • Karteek Alahari and C.V. Jawahar - Discriminative Actions for Recognising Events, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.552-563, 2006. [PDF]

  • Karteek Alahari, Satya Lahari P and C. V. Jawahar - Discriminant Substrokes for Online Handwriting Recognition, Proceedings of Eighth International Conference on Document Analysis and Recognition(ICDAR), Seoul, Korea 2005, Vol 1, pp 499-503. [PDF]

  • S. S. Ravi Kiran, Karteek Alahari and C. V. Jawahar, Recognizing Human Activities from Constituent Actions, Proceedings of the National Conference on Communications (NCC), Jan. 2005, Kharagpur, India, pp. 351-355. [PDF]

  • Karteek Alahari, Ravi Kiran Sarvadevabhatla, C. V. Jawahar - A Spatiotemporal Model for Recognizing Human Activities from Constituent Actions, in Pattern Recognition, Journal of Pattern Recognition Society. (submitted)