One-dimensional and multi-dimensional substring selectivity estimation

Jagadish, H. V.; Kapitskaia, Olga; Ng, Raymond T.; Srivastava, Divesh

doi:10.1007/s007780000029

Cited by 27 publications

(31 citation statements)

References 18 publications

(29 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These include techniques for selectivity estimation of select-join queries [71] or spatial queries [8], using query feedback to modify stored curve-fitting/parametric information for better selectivity estimation [16], selectivity estimation for alphanumeric/string data in 1-dimensional [43,45] and multi-dimensional environments [41,77], identification of quantiles [6,53,54] and their dynamic maintenance with a priori guarantees [29], approximate query answering for aggregate join queries [4], selectjoin queries [25], and within the general framework of on-line aggregation [33,32,51], computing frequencies of high-frequency items in a stream [52], and others. Despite their suboptimality compared to some of these techniques on the corresponding problems, histograms remain the method of choice, due to their overall effectiveness and wide applicability.…”

Section: Competitors Of Histogramsmentioning

confidence: 99%

The History of Histograms (abridged)

Ioannidis

2003

Proceedings 2003 VLDB Conference

326

230

View full text Add to dashboard Cite

The history of histograms is long and rich, full of detailed information in every step. It includes the course of histograms in different scientific fields, the successes and failures of histograms in approximating and compressing information, their adoption by industry, and solutions that have been given on a great variety of histogram-related problems. In this paper and in the same spirit of the histogram techniques themselves, we compress their entire history (including their "future history" as currently anticipated) in the given/fixed space budget, mostly recording details for the periods, events, and results with the highest (personally-biased) interest. In a limited set of experiments, the semantic distance between the compressed and the full form of the history was found relatively small! PrehistoryThe word 'histogram' is of Greek origin, as it is a composite of the words 'isto-s' (ιστ os) (= 'mast', also means 'web' but this is not relevant to this discussion) and 'gram-ma' (γραµµα) (= 'something written'). Hence, it should be interpreted as a form of writing consisting of 'masts', i.e., long shapes vertically standing, or something similar. It is not, however, a

show abstract

Section: Competitors Of Histogramsmentioning

confidence: 99%

The History of Histograms (abridged)

Ioannidis

2003

Proceedings 2003 VLDB Conference

326

230

View full text Add to dashboard Cite

show abstract

“…We compare our new regression tree combination estimator, CRT, with several estimators from the literature, which we discussed above: Markov estimators over both q-gram tables (ME) and suffix trees (ME_ST)-referred to as the maximal overlap method in [JKNS00], as well as the QG estimator. We also consider two other weighted combination estimators over suffix trees: WCS and WCIS.…”

Section: Estimation Techniques Comparedmentioning

confidence: 99%

“…Jagadish et al [JKNS00] improve the estimation step by relaxing the independence assumption, relying instead on the Markovian "short memory" assumption. According to this assumption, the probability of an attribute value v containing a substring s i+1 only depends on v containing substring s i (and not on the earlier substrings).…”

Section: Related Workmentioning

confidence: 99%

“…The most frequent class of string predicates-called wildcard predicates-are of the form R.A like %s%, where A is a string-valued (varchar) attribute of a relation R. Several techniques have been proposed for estimating the selectivity of wildcard predicates (e.g., [KVI96], [JKNS00]). These techniques build summary structures (e.g., pruned suffix trees or Markov tables) recording the "frequency" of carefully selected strings.…”

Section: Introductionmentioning

confidence: 99%

“…At run time, the estimation of the selectivity of a predicate R.A like %s% involves two parts: (i) parsing the query string s into possibly overlapping substrings s 1 , …, s k whose (exact) frequencies-and hence the associated selectivity of each substring predicate R.A like %s i %-can be looked up in the summary structure, and (ii) combining the (exact) selectivities of the substring predicates to estimate the selectivity of the original query predicate. To combine the selectivity of the substring predicates, existing techniques mainly rely either on the independence assumption [KVI96] (the selectivity of the R.A like %s i % predicate is independent of that associated with s j , for all j i), or on the Markov assumption [JKNS00] (the selectivity of the R.A like %s i % predicate is independent of all R.A like %s j % except when j= i-1).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations