Bloom filter (BF) has been widely used to support membership query, i.e., to judge whether a given element x is a member of a given set S or not. Recent years have seen a flourish design explosion of BF due to its characteristic of space-efficiency and the functionality of constant-time membership query. The existing reviews or surveys mainly focus on the applications of BF, but fall short in covering the current trends, thereby lacking intrinsic understanding of their design philosophy. To this end, this survey provides an overview of BF and its variants, with an emphasis on the optimization techniques. Basically, we survey the existing variants from two dimensions, i.e., performance and generalization. To improve the performance, dozens of variants devote themselves to reducing the false positives and implementation costs. Besides, tens of variants generalize the BF framework in more scenarios by diversifying the input sets and enriching the output functionalities. To summarize the existing efforts, we conduct an in-depth study of the existing literature on BF optimization, covering more than 60 variants. We unearth the design philosophy of these variants and elaborate how the employed optimization techniques improve BF. Furthermore, comprehensive analysis and qualitative comparison are conducted from the perspectives of BF components. Lastly, we highlight the future trends of designing BFs. This is, to the best of our knowledge, the first survey that accomplishes such goals.
A bloom filter is a simple, space-efficient, randomized data structure for concisely representing a static data set, in order to support approximate membership queries. It has great potential for distributed applications where systems need to share information about what resources they have. The space efficiency is achieved at the cost of a small probability of false positive in membership queries. However, for many applications the space savings and short locating time consistently outweigh this drawback. In this paper, we introduce dynamic bloom filters (DBF) to support concise representation and approximate membership queries of dynamic sets, and study the false positive probability and union algebra operations. We prove that DBF can control the false positive probability at a low level by adjusting the number of standard bloom filters used according to the actual size of current dynamic set. The space complexity is also acceptable if the actual size of dynamic set does not deviate too much from the predefined threshold. Furthermore, we present multidimension dynamic bloom filters (MDDBF) to support concise representation and approximate membership queries of dynamic sets in multiple attribute dimensions, and study the false positive probability and union algebra operations through mathematic analysis and experimentation. We also explore the optimization approach and three network applications of bloom filters, namely bloom joins, informed search, and global index implementation. Our simulation shows that informed search based on bloom filters can obtain higher recall and success rate of query than the blind search protocol.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.