Diversity-Aware Top-k Publish/Subscribe for Text Stream

Chen, Lisi; Cong, Gao

doi:10.1145/2723372.2749451

Cited by 46 publications

(25 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, for the models with the Euclidean distance or cosine similarity, many ltering methods have been suggested based on di erent geometric properties [20,26,30]. However, for the models with textual queries only a few works exist, which cannot be fully applied to our problem as either the se ing is monochromatic [21], the indexing tree of the queries should be built which is not scalable to very high dimensions [39], the storage of k nearest neighbors for all the queries is required with a xed k [9], or only the conjunctive queries are considered [1]. In the remainder of this section, we present our algorithm for the dynamic generation of the RkNNs for the textual data.…”

Section: Generating Exposure Setsmentioning

confidence: 99%

“…Finding the set of reverse k nearest neighbors (a.k.a. the in uence set) of a point has been studied in various contexts such as matching the user preferences to a given product [33] or the assignment of a publication to a set of subscriptions [1,9]. e se ing of such problems falls into either of these categories: monochromatic or bichromatic.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Learning to Un-Rank

Biega

Ghazimatin

Ferhatosmanoğlu

et al. 2017

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

View full text Add to dashboard Cite

Search engines in online communities such as Twi er or Facebook not only return matching posts, but also provide links to the pro les of the authors. us, when a user appears in the top-k results for a sensitive keyword query, she becomes widely exposed in a sensitive context. e e ects of such exposure can result in a serious privacy violation, ranging from embarrassment all the way to becoming a victim of organizational discrimination.In this paper, we propose the rst model for quantifying search exposure on the service provider side, casting it into a reverse knearest-neighbor problem. Moreover, since a single user can be exposed by a large number of queries, we also devise a learningto-rank method for identifying the most critical queries and thus making the warnings user-friendly. We develop e cient algorithms, and present experiments with a large number of user pro les from Twi er that demonstrate the practical viability and e ectiveness of our framework.

show abstract

Section: Generating Exposure Setsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Learning to Un-Rank

Biega

Ghazimatin

Ferhatosmanoğlu

et al. 2017

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

View full text Add to dashboard Cite

show abstract

“…Diversifying search results given a query in short text streams is of importance and has many applications. For instance, top-k publish/subscribe systems for tweets [7,39] are required to return to a subscriber the top-k recent tweets that are relevant and diversified given a subscribed keyword. The problem of diversifying search results in long text streams has previously been investigated by Refs.…”

Section: Introductionmentioning

confidence: 99%

“…The problem of diversifying search results in long text streams has previously been investigated by Refs. [7,33]. Both models penalize redundancy in a ranked list of documents in a stream, where redundancy is directly measured as a sum of pairwise similarities between any two documents.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Search Result Diversification in Short Text Streams

Liang

Yılmaz

Shen

et al. 2017

ACM Trans. Inf. Syst.

View full text Add to dashboard Cite

We consider the problem of search result diversification for streams of short texts. Diversifying search results in short text streams is more challenging than in the case of long documents, as it is difficult to capture the latent topics of short documents. To capture the changes of topics and the probabilities of documents for a given query at a specific time in a short text stream, we propose a dynamic Dirichlet multinomial mixture topic model, called D2M3, as well as a Gibbs sampling algorithm for the inference. We also propose a streaming diversification algorithm, SDA, that integrates the information captured by D2M3 with our proposed modified version of the PM-2 (Proportionality-based diversification Methodsecond version) diversification algorithm. We conduct experiments on a Twitter dataset and find that SDA statistically significantly outperforms state-of-the-art non-streaming retrieval methods, plain streaming retrieval methods, as well as streaming diversification methods that use other dynamic topic models.

show abstract