What web archives hold is often opaque to the public and even experts in the domain struggle to provide precise assessments. Given the increasing need for and use of crawled and archived web resources, discovery of individual records as well as sharing of entire holdings are pressing use cases. We investigate Bloom Filters (BFs) and their applicability to address these use cases. We experiment with and analyze parameters for their creation, measure their performance, outline an approach for scalability, and describe various pilot implementations that showcase their potential to meet our needs. BFs come with beneficial characteristics and hence have enjoyed popularity in various domains. We highlight their suitability for web archiving use cases and how they can contribute to very fast and accurate search services.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.