2000
DOI: 10.1007/s007780000029
|View full text |Cite
|
Sign up to set email alerts
|

One-dimensional and multi-dimensional substring selectivity estimation

Abstract: With the increasing importance of XML, LDAP directories, and text-based information sources on the Internet, there is an ever-greater need to evaluate queries involving (sub)string matching. In many cases, matches need to be on multiple attributes/dimensions, with correlations between the multiple dimensions. Effective query optimization in this context requires good selectivity estimates. In this paper, we use pruned count-suffix trees (PSTs) as the basic data structure for substring selectivity estimation. F… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0

Year Published

2000
2000
2006
2006

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 27 publications
(31 citation statements)
references
References 18 publications
(29 reference statements)
0
31
0
Order By: Relevance
“…These include techniques for selectivity estimation of select-join queries [71] or spatial queries [8], using query feedback to modify stored curve-fitting/parametric information for better selectivity estimation [16], selectivity estimation for alphanumeric/string data in 1-dimensional [43,45] and multi-dimensional environments [41,77], identification of quantiles [6,53,54] and their dynamic maintenance with a priori guarantees [29], approximate query answering for aggregate join queries [4], selectjoin queries [25], and within the general framework of on-line aggregation [33,32,51], computing frequencies of high-frequency items in a stream [52], and others. Despite their suboptimality compared to some of these techniques on the corresponding problems, histograms remain the method of choice, due to their overall effectiveness and wide applicability.…”
Section: Competitors Of Histogramsmentioning
confidence: 99%
“…These include techniques for selectivity estimation of select-join queries [71] or spatial queries [8], using query feedback to modify stored curve-fitting/parametric information for better selectivity estimation [16], selectivity estimation for alphanumeric/string data in 1-dimensional [43,45] and multi-dimensional environments [41,77], identification of quantiles [6,53,54] and their dynamic maintenance with a priori guarantees [29], approximate query answering for aggregate join queries [4], selectjoin queries [25], and within the general framework of on-line aggregation [33,32,51], computing frequencies of high-frequency items in a stream [52], and others. Despite their suboptimality compared to some of these techniques on the corresponding problems, histograms remain the method of choice, due to their overall effectiveness and wide applicability.…”
Section: Competitors Of Histogramsmentioning
confidence: 99%
“…We compare our new regression tree combination estimator, CRT, with several estimators from the literature, which we discussed above: Markov estimators over both q-gram tables (ME) and suffix trees (ME_ST)-referred to as the maximal overlap method in [JKNS00], as well as the QG estimator. We also consider two other weighted combination estimators over suffix trees: WCS and WCIS.…”
Section: Estimation Techniques Comparedmentioning
confidence: 99%
“…Jagadish et al [JKNS00] improve the estimation step by relaxing the independence assumption, relying instead on the Markovian "short memory" assumption. According to this assumption, the probability of an attribute value v containing a substring s i+1 only depends on v containing substring s i (and not on the earlier substrings).…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations