2014
DOI: 10.14778/2735471.2735472
|View full text |Cite
|
Sign up to set email alerts
|

Selectivity estimation on streaming spatio-textual data using local correlations

Abstract: In this paper, we investigate the selectivity estimation problem for streaming spatio-textual data, which arises in many social network and geo-location applications. Specifically, given a set of continuously and rapidly arriving spatiotextual objects, each of which is described by a geo-location and a short text, we aim to accurately estimate the cardinality of a spatial keyword query on objects seen so far, where a spatial keyword query consists of a search region and a set of query keywords.To the best of o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 25 publications
(17 citation statements)
references
References 26 publications
0
17
0
Order By: Relevance
“…The KMV sketch technique has been widely used to esti- [42], [20], [37]. The idea of imposing a global threshold on KMV sketch is first proposed in [37] in the context of term pattern size estimation. However, there is no theoretical analysis for the estimation performance.…”
Section: Related Workmentioning
confidence: 99%
“…The KMV sketch technique has been widely used to esti- [42], [20], [37]. The idea of imposing a global threshold on KMV sketch is first proposed in [37] in the context of term pattern size estimation. However, there is no theoretical analysis for the estimation performance.…”
Section: Related Workmentioning
confidence: 99%
“…As shown in [9], Equation 3 can be modified to compound set operation where L = L A1 ⊕ ... ⊕ L An and k = min(k A1 , ..., k An ). An improved KMV sketch, named G-KMV, is proposed to estimate the multi-union size in [28]. G-KMV imposes a global threshold and ensures that all hash values smaller than the threshold will be kept.…”
Section: Kmv Synopsesmentioning
confidence: 99%
“…Although the set containment search query can be naturally modeled as range counting problem as discussed in Section 1, existing range counting techniques are exponentially dependent on the dimensionality (i.e., number of distinct elements in our problem) and not applicable to solving the containment selectivity estimation problem in our problem ( [13], [23]). Distinct value estimators (e.g., KMV [9], bottom-k, min-hash [13]) are adopted in [28] to solve subset containment search (i.e., query record is a subset of data record). We also extend the distinct value estimator KMV and develop the IL-GKMV approach in Section 3 and demonstrate theoretically and through extensive experiments that distinct value estimators cannot efficiently and accurately support the superset containment semantics studied in this paper.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…We also analyze that the performance of distinct value estimator-based approach degrades when the vocabulary size is large due to the inherent superset containment semantics of the problem studied in this paper. Wang et al [32] study selectivity estimation on streaming spatio-textual data where the textual data are a set of keywords/terms (i.e., elements). However, the query semantic is different as it specifies a subset containment search on the textual data, i.e., the keywords (elements) in the query should be contained by the keywords from spatial objects.…”
Section: Challengesmentioning
confidence: 99%