Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data 2008
DOI: 10.1145/1376616.1376655
|View full text |Cite
|
Sign up to set email alerts
|

Cost-based variable-length-gram selection for string collections to support approximate queries efficiently

Abstract: Approximate queries on a collection of strings are important in many applications such as record linkage, spell checking, and Web search, where inconsistencies and errors exist in data as well as queries. Several existing algorithms use the concept of "grams," which are substrings of strings used as signatures for the strings to build index structures. A recently proposed technique, called VGRAM, improves the performance of these algorithms by using a carefully chosen dictionary of variable-length grams based … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
62
0

Year Published

2009
2009
2016
2016

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 77 publications
(62 citation statements)
references
References 16 publications
0
62
0
Order By: Relevance
“…[28] is a survey on approximate string matching methods. Recent progress in the literature that is related to similarity joins includes similarity joins with various similarity or distance functions [29], [19], [5], [4], [8], [11], [12], [30], [31], [22], [32], [33], [21], similarity selection [34], [3], and selectivity estimation [35], [36], [37], [38], [39], [40].…”
Section: Preprocessing Timementioning
confidence: 99%
See 4 more Smart Citations
“…[28] is a survey on approximate string matching methods. Recent progress in the literature that is related to similarity joins includes similarity joins with various similarity or distance functions [29], [19], [5], [4], [8], [11], [12], [30], [31], [22], [32], [33], [21], similarity selection [34], [3], and selectivity estimation [35], [36], [37], [38], [39], [40].…”
Section: Preprocessing Timementioning
confidence: 99%
“…Filters based on mismatching q-grams are proposed to further speed up the query processing [8]. Variable-length grams are also proposed [11], [12], which can be easily integrated into other algorithms and help to achieve better performance.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations