Summary
Cloud storage is a widely utilized service for both personal and enterprise demands. However, despite its advantages, many potential users with enormous amounts of sensitive data (big data) refrain from fully utilizing the cloud storage service due to valid concerns about data privacy. An established solution to the cloud data privacy problem is to perform encryption on the client‐end. This approach, however, restricts data processing capabilities (eg, searching over the data). Accordingly, the research problem we investigate is how to enable real‐time searching over the encrypted big data in the cloud. In particular, semantic search is of interest to clients dealing with big data. To address this problem, in this research, we develop a system (termed S3BD) for searching big data using cloud services without exposing any data to cloud providers. To keep real‐time response on big data, S3BD proactively prunes the search space to a subset of the whole dataset. For that purpose, we propose a method to cluster the encrypted data. An abstract of each cluster is maintained on the client‐end to navigate the search operation to appropriate clusters at the search time. Results of experiments, carried out on real‐world big datasets, demonstrate that the search operation can be achieved in real‐time and is significantly more efficient than other counterparts. In addition, a fully functional prototype of S3BD is made publicly available.
Capabilities for trustworthy cloud-based computing and data storage require usable, secure and efficient solutions which allow clients to remotely store and process their data in the cloud. In this paper, we present RESeED, a tool which provides user-transparent and cloud-agnostic search over encrypted data using regular expressions without requiring cloud providers to make changes to their existing infrastructure. When a client asks RESeED to upload a new file in the cloud, RESeED analyzes the file's content and updates novel data structures accordingly, encrypting and transferring the new data to the cloud. RESeED provides regular expression search over this encrypted data by translating queries onthe-fly to finite automata and analyzes efficient and secure representations of the data before asking the cloud to download the encrypted files. We evaulate a working prototype of RESeED experimentally (currently publicly available) and show the scalability and correctness of our approach using real-world data sets from arXiv.org and the IETF. We show absolute accuracy for RESeED, with very low (6%) overhead, and high performability, even beating grep for some benchmarks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.