2008 IEEE 24th International Conference on Data Engineering 2008
DOI: 10.1109/icde.2008.4497434
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Merging and Filtering Algorithms for Approximate String Searches

Abstract: Abstract-We study the following problem: how to efficiently find in a collection of strings those similar to a given query string? Various similarity functions can be used, such as edit distance, Jaccard similarity, and cosine similarity. This problem is of great interests to a variety of applications that need a high real-time performance, such as data cleaning, query relaxation, and spellchecking. Several algorithms have been proposed based on the idea of merging inverted lists of grams generated from the st… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
269
0
7

Year Published

2009
2009
2020
2020

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 229 publications
(277 citation statements)
references
References 7 publications
1
269
0
7
Order By: Relevance
“…Progressive computation is orthogonal to the filtering techniques and we do not consider progressive computation in comparing different filters in our work. Different filters have been used in an ad hoc fashion to accelerate the membership checking, e.g., the length filter is used in [19], [20]. However, none of the previous work exploits the many existing string filters and uses them systematically.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Progressive computation is orthogonal to the filtering techniques and we do not consider progressive computation in comparing different filters in our work. Different filters have been used in an ad hoc fashion to accelerate the membership checking, e.g., the length filter is used in [19], [20]. However, none of the previous work exploits the many existing string filters and uses them systematically.…”
Section: Related Workmentioning
confidence: 99%
“…In [19], efficient exact algorithms are proposed to conduct approximate string checking based on merging token inverted lists. In [6], the ISH (Inverted Signature-based Hashtable) structure is presented with the focus on reducing the filter checking time.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Several existing work studies the similarity search problem [24], [25], [26], [27], which returns the records in a collection whose similarity with the query exceeds a given threshold. Based on the inverted list framework, [26] proposes an efficient principle to skip records when accessing inverted lists.…”
Section: Related Workmentioning
confidence: 99%