As data continues to be generated at exponentially growing rates in heterogeneous formats, fast analytics to extract meaningful information is becoming increasingly important. Systems widely use in-memory caching as one of their primary techniques to speed up data analytics. However, caches in data analytics systems cannot rely on simple caching policies and a fixed data layout to achieve good performance. Different datasets and workloads require different layouts and policies to achieve optimal performance. This paper presents ReCache, a cache-based performance accelerator that is reactive to the cost and heterogeneity of diverse raw data formats. Using timing measurements of caching operations and selection operators in a query plan, ReCache accounts for the widely varying costs of reading, parsing, and caching data in nested and tabular formats. Combining these measurements with information about frequently accessed data fields in the workload, ReCache automatically decides whether a nested or relational columnoriented layout would lead to better query performance. Furthermore, ReCache keeps track of commonly utilized operators to make informed cache admission and eviction decisions. Experiments on synthetic and real-world datasets show that our caching techniques decrease caching overhead for individual queries by an average of 59%. Furthermore, over the entire workload, ReCache reduces execution time by 19-75% compared to existing techniques. PVLDB Reference Format:Tahir Azim, Manos Karpathiotakis and Anastasia Ailamaki. ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data. PVLDB, 11(3): xxxx-yyyy, 2017.
Consumer data, such as documents and photos, are increasingly being stored in the cloud. To ensure confidentiality, the data is encrypted before being outsourced. However, this makes it difficult to search the encrypted data directly. Recent approaches to enable searching of this encrypted data have relied on trapdoors, locally stored indexes, and homomorphic encryption. However, data communication cost for these techniques is very high and they limit search capabilities either to a small set of pre-defined trapdoors or only allow exact keyword matching.This paper addresses the problem of high communication cost of bloom-filter based privacy-preserving search for measuring similarity between the search query and the encrypted data. We propose a novel compression algorithm which avoids the need to send the entire encrypted bloom filter index back to the client. This reduces the cost of communicating results to the client by over 95%. Using sliding window bloom filters and homomorphic encryption also enables our system to search over encrypted data using keywords that only partially match the originally stored words. We demonstrate the viability of our system by implementing it on Google Cloud, and our results show that the cost of partially matching search queries on encrypted data scales linearly with the total number of keywords stored on the server.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.