2013
DOI: 10.1145/2493175.2493178
|View full text |Cite
|
Sign up to set email alerts
|

Fast candidate generation for real-time tweet search with bloom filter chains

Abstract: The rise of social media and other forms of user-generated content have created the demand for real-time search: against a high-velocity stream of incoming documents, users desire a list of relevant results at the time the query is issued. In the context of real-time search on tweets, this work explores candidate generation in a two-stage retrieval architecture where an initial list of results is processed by a second-stage rescorer to produce the final output. We introduce Bloom filter chains, a novel extensi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
15
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 24 publications
(16 citation statements)
references
References 63 publications
1
15
0
Order By: Relevance
“…3 We found that the idf in both conditions were nearly identical across all query terms-this is not surprising considering that idf is on a log scale, and it takes substantial variations in document frequencies to have a noticeable affect on the value. However, Figure 1 shows that there is a large difference in effectiveness for a few topics: MB15, MB17, and MB35.…”
Section: Resultssupporting
confidence: 50%
See 1 more Smart Citation
“…3 We found that the idf in both conditions were nearly identical across all query terms-this is not surprising considering that idf is on a log scale, and it takes substantial variations in document frequencies to have a noticeable affect on the value. However, Figure 1 shows that there is a large difference in effectiveness for a few topics: MB15, MB17, and MB35.…”
Section: Resultssupporting
confidence: 50%
“…In this architecture, our experiments consider the candidate generation stage. Additional work has shown that end-to-end retrieval effectiveness is insensitive to the candidate generation algorithm [6,3], which means that our experiments using simple query-likelihood accurately reflect real-world conditions.…”
Section: Discussionmentioning
confidence: 98%
“…It requires methods and tools which can effectively extract data via APIs and then analysing these data to extract information of interest [3,10]. Social media research has introduced various methods for the effective collection of disaster related posts such as Bloom Filter Chains for real-time tweet search [9], TAKMI technology for content analysis [8] and twitter APIs -'crawl' and 'timeline' [11]. Krishnamurthy et al (2008) used these two methods, both relying on API functions provided by Twitter for the collection of large amount of data through crawl and timeline functions of Twitter.…”
Section: A Data Collection Methodsmentioning
confidence: 99%
“…This limits the applicability of probabilistic data structures to domain-specific use only, such as Genome Sequencing. Another application of probabilistic data structures is big data queries, for instance, BWand [32] for fast query on Twitter tweets, content filtering in MapReduce programs [21], and NoSQL databases such as Google BigTable, Apache HBase and Apache Cassandra. In these cases, the probabilistic data structures are used as an indexing technique to quickly locate information in a distributed storage system.…”
Section: Data Compressionmentioning
confidence: 99%