Private Outlier Detection and Content based Encrypted Search
Nisarg Raval (homepage)
At present, two approaches are widely used to ensure security of the data. First one is based on Secure Multiparty Computations (SMC), where users communicate with each other using SMC protocol to securely evaluate any function on their data. SMC gives provable security without compromising utility of the data. However, the solutions based on the general protocol of SMC requires enormous computational and communication overhead, thus limiting the practical deployment of the secure algorithms. The second approach is to encrypt the data at the user end and perform any operations in the encrypted domain to preserve the confidentiality of the data. This leads to difficulty in performing various operations like search, retrieval and computations. Homomorphic encryption allows secure computations over encrypted data but its impractical due to high computational cost.
In this thesis, we focus on developing privacy preserving, communication and computationally efficient algorithms in the domain of Data Mining and Information Retrieval. Specifically, we design efficient privacy preserving solutions for two widely used tasks - outlier detection and content based similarity search over encrypted data. Private computation of outliers among data collected from different sources has many applications in the area of data mining. First, we propose a novel algorithm of distance based outlier detection using Locality Sensitive Hashing (LSH) in a distributed setting where data is horizontally partitioned among data owners. We show that the proposed algorithm is scalable in case of large datasets with high dimensionality. Then, we extend our distributed outlier algorithm to propose an algorithm for private outlier detection that broke the previous known bounds of quadratic cost. We compare our algorithm with the state of the art technique and show that our algorithm is better both in terms of computational as well as communication complexity.
Next, we explore the properties of hierarchical index structures and propose an algorithm for content based similarity search over encrypted data that do not reveal user query to the server. To the best of our knowledge, we are the first one to propose a definition and a scheme for the problem of Content Similarity Searchable Symmetric Encryption (CS-SSE). Finally, we extend our CS-SSE scheme for Private Content Based Image Retrieval, which is essential for many applications like searching health records using CT scans and patent search for logos. We also show that our algorithm achieves optimal computational cost with similar accuracy and privacy when compared to the existing techniques.
|Year of completion:||2013|
|Advisor :||C. V. Jawahar and Dr. Kannan Srinathan|