Thesis Students

De-identification for Privacy Protection in Surveillance Videos.

Prachi Agrawal (homepage)

Advances in cameras and web technology have made it easy to capture and share large amounts of video data over to a large number of people. Also, an increasing number of video cameras observe public spaces like airports, train stations, shopping malls, streets, and other public places today. Video surveillance is an important tool to prevent, detect and reduce crime and disorder. It is also widely used to monitor people's conduct, detect their presence in an area and/or possibly study their actions too. However, many civil rights and privacy groups have expressed their concern that by allowing continual increases in surveillance of citizens, we will end up in a mass surveillance society, with extremely limited, or non-existent political and/or personal freedoms. With the widespread use of surveillance today, we are becoming used to ''being watched''. As time goes on, the more accustomed we become to video surveillance, we are not as likely to fight to maintain our right to privacy. Critics believe that in addition to its obvious function of identifying and capturing individuals who are committing undesirable acts, surveillance also functions to create in everyone a feeling of always being watched, so that they become self-policing. At the same time prices of hardware have fallen, and the capabilities of systems have grown dramatically. The day is not far when it would be easy to convert even the lowcost video surveillance units into ''intelligent'' human recognition systems. These raise concerns on the unintentional and unwarranted invasion of the privacy of individuals caught in the videos.

The data collected during video surveillance consist mainly of images and sounds which allow identification of people captured, whether directly or indirectly, in addition to monitoring their conduct. While there may be a possible security need to identify the individuals in these videos, identifying the action suffices in most cases. The actor needs to be identified only rarely and only to authorized personnel. The privacy concerns related to the processing and distribution of the collected videos are genuine and will grow with wider adaptation of video technology. To address these concerns, automated methods to de-identify individuals in these videos are necessary. De-identification does not aim at destroying all information involving the individuals. Its ideal goals are to obscure the identity of the actor without obscuring the action. There is a natural trade-off between protecting privacy and providing sufficient detail. The goal is to protect the privacy of the individuals while providing sufficient feel for the human activities in the space being imaged.

The policies on the privacy issues are still unclear and are fast evolving. The need of the hour is to be cautious as digital data has the potential to be replicated with little control, especially when placed on the cloud. This thesis outlines the scenarios in which de-identification is required and the issues brought out by those. We also present an approach to de-identify individuals from videos. Our approach involves tracking and segmenting individuals in a conservative voxel space involving x, y and time. A de-identification transformation is applied per frame using these voxels to obscure the identity. A robust de-identification scheme with a randomization module was designed in order to make reconstruction and comparison based identification attacks impossible. Face, silhouette, gait, and other characteristics need to be obscured, ideally. We show results of our scheme on a number of videos and for several variations of the transformations. We present the results of applying algorithmic identification on the transformed videos. We also present the results of a user-study to evaluate how well humans can identify individuals from the transformed videos. The results showed that as the parameter controlling the amount of deidentification increased, the actors became less identifiable (more privacy), while the video started losing the context and detail. The studies also suggest that an action can be recognized with more accuracy than the actor in a de-identified video, which is the guiding principle of de-identification. We also created a suitable dataset for the purpose of user studies, as the standard datasets available online are not challenging enough for such an evaluation.

Year of completion:	2010
Advisor :	P. J. Narayanan

Related Publications

Prachi Agrawal and P. J. Narayanan - Person De-identification in Videos Proceedings of the Ninth Asian Conference on Computer Vision(ACCV 09), 23-27 September, 2009, Xi'an, China. [PDF]

Prachi Agrawal and P. J. Narayanan, Person De-identification in Videos, in The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2010.

Demonstrations

The entire system was presented for the first time and a user study was conducted on the people who attended ExOR in November, 2009.
The system was made more user friendly. It was showcased for demonstration and another detailed user study was conducted on people who attended R & D Showcase in January, 2010.

Downloads

High Quality Rendering of Large Point-based Surfaces.

Naveen Kumar Reddy Bolla (homepage)

Points-based models have gained popularity in recent years. The most attractive feature of the point-based primitives is its simplicity. No information about connectivity or topology of the surface is stored explicitly. Adaptive modification and dynamic sampling of the surface do not suffer from complexity and robustness like triangle mesh. The level of detail is simple, fast and almost continuous. However, points based methods suffer from surface normal discontinuities in representation. Shading is also not independent from sampling rate. Point based models can be visualized by rendering the points directly. Each sample in the representation has certain attributes (normal, color, material properties) for rendering and shading them as a continuous surface. Typically, a point sample also has a radius to define an area around it. Such samples are called surfels and approximate the shape of the surface linearly in their neighborhood. Since linear splats can only be flat shaded, such representations require a large number of samples for a good shading quality. This slows the rendering due to increase in rendering computation and related memory bus activity.

Point-based rendering suffer from the limited resolution of the fixed number of samples representing the model. At some distance, the screen space resolution is high relative to the point samples, which causes under-sampling. A better way of rendering a model is to re-sample the surface during the rendering at the desired resolution in object space, guaranteeing a sampling density sufficient for image resolution. Output sensitive sampling samples objects at a resolution that matches the expected resolution of the output image. This is crucial for hole-free point-based rendering. Many technical issues related to point-based graphics boil down to reconstruction and re-sampling. A point based representation should be as small as possible while conveying the shape well.

In the first part of this thesis, we present a compact representation for point based models using non-linear surface elements. In the second part, we present a method for fast ray casting of a dynamic surface defined by large number of attributed points called Metaballs.

Algebraic Splats: Higher-order approximations of the local neighborhood have the potential to represent the shape using fewer primitives, simultaneously achieving higher rendering speeds and quality. In this thesis, we present algebraic splats, which are low-order polynomials that approximate the local neighborhood of a pointset. We adapt the Moving Least Squares (MLS) approximation to generate algebraic splat representations for a given point set. Quadratic and cubic splats as basic rendering primitive are able to approximate point-based models using 10% or fewer points than linear splats, for equivalent visual quality. Though rendering a non-linear patch is slower compared to a linear splat, the overall speed of rendering of an object could be faster. The approximation error can also be less using a small number of higher order primitives. We represent a given point set with a user-specified small number of algebraic splats with optimal rendering quality. This is done by decimating the point set and jointly approximating each using a local algebraic surface based on the Moving Least Squares (MLS) approximation.

david.mr.points w100 david.mr.splats w100

Our rendering provides smooth surfaces with normals everywhere. We can render polynomials directly on today's GPUs using ray-tracing because of the semi-implicit nature of the splats in the local reference domain. The major contributions of this work are: (i) A compact and lightweight representation for point geometry using nonlinear surface elements, called algebraic splats; (ii) A multi-resolution representation which provides continuous Level of Detail; and (iii) A ray-tracing approach to render the algebraic splats on the GPU. The method includes an adaptive anti-aliasing step to obtain smooth silhouettes. The David 2mm model can be represented using about 30K (or 0.8% of original) algebraic splats with little or low reduction in visual quality. We can raycast the David model with adaptive antialiasing (3X grid) at 130-150 fps and 80-90 fps with shadowing. Our method is faster by a factor of 10 on most models and require about 90% less diskspace in comparision to linear splats.

Metaballs: Since the available computational power is steadily growing, more and more science areas rely on simulations of ever-growing problem sizes producing huge amount of data to be analyzed. Simulation and experimental measurement in life sciences, physics, chemistry, materials, and thermodynamics yield large and often also time-dependent datasets. Interactive visualization is the key service that facilitates the analysis of such datasets and thus enables the researchers in those fields to quickly assess and compare the results of a simulation or measurement, verify and improve their models, and in so doing coming ever closer to understanding how dynamic processes work.

Metaballs (also known as blobs) are popular particle-based implicit surface representations. A set of them can represent smooth deformable, free-form shapes represented as the iso-surface of a scalar field. Current methods can handle only a moderate number of metaballs at interactive rates on commodity GPUs. In this thesis, we present a method to handle a million or more dynamic metaballs at interactive rates. We use a perspective grid as the acceleration structure, which is built in each frame on the GPU using fast primitives. Perspective grids provide perfect coherence to primary rays. We ray trace each grid cell at a time using an adaptation of the marching points methods to metaballs. This method extracts high performance from the SIMD GPUs of today. We show interactive handling of upto a million balls. Our method is also able to handle secondary rays, though at much reduced speeds due to the lack of coherence but is suitable for live visualization of particle-based simulations. The average build time for 1 million primitives is about 38 milliseconds and we can render a million metaballs in about 94 milliseconds. Performance of Secondary rays is not fast due to the lack of coherence, with a rendering time of 150 ms for 1K primitives. (more...)

Year of completion:	July 2010
Advisor :	P. J. Narayanan

Related Publications

Naveen Kumar Bolla and P. J. Narayanan - Algebraic Splats Representation for Point Based Models IEEE Sixth Indian Conference on Computer Vision, Graphics & Image Processing (ICVGIP 2008), pp. 71-78, 16-19 Dec,2008, Bhubaneswar, India. [PDF]

Naveen Kumar Bolla, P. J. Narayanan, Algebraic Splats for High-Quality Rendering of Points (Submitted to Graphical Models, Under Review).
Naveen Kumar Bolla, P. J. Narayanan, Real-time Ray-tracing of Million Metaballs. (Submitted to EGSR)

Downloads

Efficient Privacy Preserving Protocols for Visual Computation

Maneesh Upmanyu (homepage)

The rapid expansion of the Internet is receiving a great deal of attention world-wide. The technological developments and the increase in online communications have played a vital role in revolutionalizing the Information Age. There is a tremendous growth of online applications to manipulate the user's personal data, which has resulted in the widespread availability of the user's personal data in the digital form. This raises the issue of privacy protection against potential misuse of this data by legitimate service providers or intruders. Without proper countermeasures to thwart the attacks, security problems become a major threat and a serious impediment to further development of business applications on communication systems.

Many of the current solutions provides information security by assuming a level of trust among the parties. The leakage of the critical data to third parties is prevented by applying cryptographic primitives as a secure layer over the standard algorithm. On the other hand, privacy preservation computation is more closely related to Secure Multiparty Computation (SMC). SMC enables two parties; one with the function f() and the other with the input x; to compute f(x) without revealing them to each other. However, the solutions based on the general protocol of SMC requires enormous computational and communication overhead, thus limiting the practical deployment of the secure algorithms.

In this dissertation, we focus on development of `highly-secure', `comunication and computationally efficient' algorithms to problems with `immediate impact' in the domain of computer vision and related areas. Security issues in computer vision primarily originates from the storage, distribution and processing of the personal data, whereas privacy concerns with tracking down of the user's activity. The primary challenge is in providing the ability to perform generic computations on the visual data, while ensuring provable security. In this thesis, we propose lightweight encryption's for visual data, such that the server should be able to carry out the computations on the encrypted data and also store the stream if required, without being able to decipher the actual contents of the image. Moreover, the protocols are designed such that the interaction and the data communication among the servers is kept to a minimum.

It has been proven before that the best way to achieve secure computation on a remote server is by using the cryptographic protocol of SMC. Thus, a method that provides provable security, while allowing efficient computations without incurring either significant computation or communication overhead has remained elusive till now. We show that, for designing secure visual algorithms one can exploit certain properties such as scalability, limited range etc, inherent to visual data to break this impenetrable barrier. We study and propose secure solutions for applications such as Blind Authentication, i.e. blindly authenticating a remote-user using his biometric. Subsequently, we present a highly secure framework for carrying out visual surveillance on random looking video streams at remote servers. We then propose a simple and an efficient cloud-computing based solution using the paradigm of secret sharing to privately cluster an arbitrary partitioned data among N users. The solutions we propose are accurate, efficient and scalable and has potential to extend over to even more diverse applications.

In our first work, blind authentication, we propose private biometric authentication protocol which is extreamly secure under a variety of attacks and can be used with a wide variety of biometric traits. The protocol is blind in the sense that it reveals only the identity, and no additional information about the user or the biometric to the authenticating server or vice-versa. The primary advantage of the proposed approach is the ability to achieve classification of a strongly encrypted feature vector using generic classifiers such as Neural Networks and SVMs. Our proposed solution addresses the concerns of user's privacy, template protection, and trust issues. And captures the advantages of biometric authentication as well as the security of public key cryptography.

We then present an efficient, practical and highly secure framework for implementing visual surveillance on untrusted remote computers. To achieve this we demonstrate that the properties of visual data can be exploited to break the bottleneck of computational and communication overheads. The issues in practical implementation of certain algorithms including change detection, optical flow, and face detection are addressed. Our method enables distributed secure processing and storage, while retaining the ability to reconstruct the original data in case of a legal requirement. Such an architecture provides us both security as well as computation and communication efficiency.

We next extend our proposed paradigm to achieve the ability to do un-supervised learning using K-means in the encrypted domain. Traditional approaches uses primitives such as SMC or PKC, thus compromising the efficiency of the solutions and in return provide very high level of privacy which is usually an overkill in practice. We use the paradigm of secret sharing , which allows the data to be divided into multiple shares and processed separately at different servers. Our method shows that privacy need not be always at the cost of efficiency. Our proposed solution is not only computationally efficient but also secure independent of whether or not P ≠ NP.

Year of completion:	June 2010
Advisor :	C. V. Jawahar, Anoop M. Namboodiri, Dr. Kannan Srinathan

Related Publications

Maneesh Upmanyu, Anoop M. Namboodiri, Kannan Srinathan and C. V. Jawahar - Blind Authentication: A Secure Crypto-Biometric Verification Protocol IEEE Transactions on Information Forensics and Security, Vol. 5(2), pp. 255-268 (2010). [PDF]
Maneesh Upmanyu, Anoop M. Namboodiri, K. Srinathan and C. V. Jawahar - Efficient Biometric Verification in Encrypted Domain Proceedings of the 3rd International Conference on Biometrics (ICB 2009), pp. 899-908, June . 2-5, 2009, Alghero, Italy. [PDF]
Maneesh Upmanyu, Anoop M. Namboodiri, K. Srinathan and C.V. Jawahar - Efficient Privacy Preserving Video Surveillance Poceedings of the 12th International Conference on Computer Vision (ICCV), 2009, Kyoto, Japan [PDF]

Maneesh Upmanyu, Anoop M. Namboodiri, K. Srinathan and C.V. Jawahar - Efficient Privacy Preserving K-Means Clustering: ; Pacific Asia Workshop on Intelligence and Security Informatics (PAISI) 2010 (Best Paper Award) (Download).

Downloads

Speeding Up MRF Optimization Using Graph Cuts For Computer Vision

Vibhav Vineet (homepage)

Many problems in computer vision are mapped to minimization of an energy or a cost function de- fined over a discrete Markov Random Field (MRF) or Conditional Random Field (CRF). The primary reason for the popularity has been the successes of the graph cuts based methods in solving various labeling problems, especially, image segmentation, stereo correspondence and image denoising. Subse- quently, computer vision researchers focused on improving the computational efficiency of graph cuts based methods. In this dissertation, we present two methods to speed up the graph cuts to solve different MRF optimization problems optimally.

In the first half of the dissertation, we present efficient methods to speed up MRF optimization using graph cuts on the GPU. In particular, we present our optimal implementation of the push-relabel methods to solve various labeling problems and show how we have efficiently used different facilities available on the GPU. We also present the incremental α-expansion algorithm to solve multilabel MRFs on the GPU and show how shifting the energy function computation and graph construction to the GPU speeds up the overall computation. We compare the efficiency of our approaches on the problems of image segmentation, image restoration, photomontage and disparity computation on the GPU.

The second half of the dissertation deals with the minimization of energy functions defined over large images involving millions of pixels and large number of labels. The computation and memory costs of the current vision algorithms increase at a very fast rate with the increase in the number of the variables in the MRF. This makes them inapplicable to be used for problems involving large images. We present Pyramid Reparameterization scheme to improve the computation efficiency without sacrificing global optimality for bilabel segmentation. We also formulate the multilabel problems esp. stereo correspondence and image denoising on multiresolution framework using Pyramid Reparameterization. This formulation reduces both the computation and memory costs. We model the optimization problem on the pyramidal framework to dynamically update and initialize the solution at the current level with the solution from the previous level. The results of this method are compared with those obtained using conventional methods which solve graph cuts on the largest image.

Year of completion:	June 2010
Advisor :	P. J. Narayanan

Related Publications

Vibhav Vineet and P. J. Narayanan - Solving Multilabel MRFs using Incremental Alfa Expansion on the GPUs Proceedings of the Ninth Asian Conference on Computer Vision(ACCV 09), 23-27 September, 2009, Xi'an, China. [PDF]
Vibhav Vineet, Pawan Harish, Suryakanth Patidar and P. J. Narayanan - Fast Minimum Spanning Tree for Large Graphs on the GPU Proceedings of High-Performance Graphics (HPG 09), 1-3 August, 2009, Louisiana, USA. [PDF]
Vibhav Vineet and P. J. Narayanan - CUDA Cuts: Fast Graph Cuts on the GPU Proceedings of CVPR Workshop on Visual Computer Vision on GPUs, 23-28th june, Anchorage, Alaska, USA. IEEE Computer Society 2008 [PDF]

Fast Graph Cuts on the GPU: P. J. Narayanan, Vibhav Vineet and Timo Stitch - Fast Graph Cuts on the GPU. GPU Computing Gems (GCG), Volume 1 Dec. 2010(Book Chapter).
Pawan Harish, Vibhav Vineet and P. J. Narayanan - Large Graph Algorithms for Massively Multi-threaded Architectures, IIIT Tech Report, IIIT/TR/2009/74.
Vibhav Vineet, Pawan Harish, Suryakant Patidar and P. J. Narayanan - Fast Minimum Spanning Tree for Large Graphs on the GPU: GPU Computing Gems (GCG). (Under Submission)

Downloads

Towards Efficient and Scalable Visual Processing in Images and Videos

Mihir Jain

The amount of multimedia content produced and made available on Internet and in professional and personal collections is constantly growing. Equally increasing are the needs in terms of efficient and effective ways to manage it. This has led to a great amount of research into content based retrieval and visual recognition. In this thesis, we focus on efficient visual content analysis in images and videos. Efficiency has emerged as one of the key issues with increase in quantity of data. Understanding of a visual content has several aspects associated with it. One can concentrate on recognizing the inherent characteristics of image (independent or from a video) like objects, scene and context. Searching for a sequence of images based on similarity or characterizing the video based on its visual content could be some other aspects.

We investigate three different approaches for visual content analysis in this thesis. In the first, we target the detection and classification of different object and scene classes in images and videos. The task of classification is to predict the presence of an object or a specific scene of interest in the test image. Object detection further involves localizing each instance of the object present. We do extensive experimentation over very large and challenging datasets with large number of object and scene categories in it. Our detection as well as classification are based on Random Forest combined with combinations of different visual features describing shape, appearance and color. We exploited the computational efficiency in both training and testing, and other properties of Random Forest for detection and classification. We also proposed enhancements over our baseline model of object detector. Our main contribution here is that we achieve fast object detection with accuracy comparable to the state of art.

The second approach is based on processing continuous stream of videos to detect video segments of interest. Our method is example-based where visual content to be detected or filtered is characterized by a set of examples available apriori. We approach the problem of video processing in a manner complimentary to that of video retrieval. We begin with a set of examples (used as queries in retrieval) and index them in the database. The larger video collection, which needs to be processed, is unseen during the off-line indexing phase. We propose an architecture based on trie data structure and bag of words model to simultaneously match multiple example videos in the database with the input large video stream. We demonstrate the application of our architecture for the task of content based copy detection (CBCD).

In our third and final approach we apply pattern mining algorithms in videos to characterize the visual content. They are derived out of data mining schemes for efficient analysis of the content in video databases. Two different video mining schemes are employed; both aimed at detecting frequent and representative patterns. For one of our mining approaches, we use an efficient frequent pattern mining algorithm over a quantized feature space. Our second approach uses random forest to represent video data as sequences, and mine the frequent sequences. We experiment on broadcast news videos to detect what we define as video stop-words and extract the contents which are more important such as breaking news. We are also able to characterize the movie videos by automatically identifying the characteristic scenes and main actors of the movie.

The ideas proposed in the thesis have been implemented and validated with extensive experimental results. We demonstrate the accuracy, efficiency and scalability of all the three approaches over large and standrad datasets like VOC PASCAL, TRECVID, MUSCLE-VCD as well as movie and news datasets.

Year of completion:	2010
Advisor :	C. V. Jawahar

Related Publications

Mihir Jain, Sreekanth Vempati, Chandrika Pulla and C.V. Jawahar - Example Based Video Filters Proceedings of the 8th International Conference on Image and Video Retrieval (CIVR 2009), July . 8-10, 2009, Santorini, Greece. [PDF]

Sreekanth Vempati, Mihir Jain, Omkar Parkhi and C. V. Jawahar - Andrea Vedaldi, Marcin Marszalek and Andrew Zisserman In Proceedings of the TREC Video Retrieval (TRECVID) Workshop organized by NIST, Gaithersburg, USA, 2009.
James Philbin, Manuel Marin-Jimenez, Siddharth Srinivasan and Andrew Zisserman, Mihir Jain, Sreekanth Vempati, Pramod Sankar and C. V. Jawahar, In Proceedings of the TREC Video Retrieval (TRECVID) Workshop organized by NIST, Gaithersburg, USA, 2008

Downloads

ppt

De-identification for Privacy Protection in Surveillance Videos.

Related Publications

Demonstrations

Downloads

High Quality Rendering of Large Point-based Surfaces.

Related Publications

Downloads

Efficient Privacy Preserving Protocols for Visual Computation

Maneesh Upmanyu (homepage)

Related Publications

Downloads

Speeding Up MRF Optimization Using Graph Cuts For Computer Vision

Related Publications

Downloads

Towards Efficient and Scalable Visual Processing in Images and Videos

Related Publications

Downloads

More Articles …