1995
DOI: 10.1002/spe.4380250307
|View full text |Cite
|
Sign up to set email alerts
|

Finding approximate matches in large lexicons

Abstract: Approximate string matching is used for spelling correction and personal name matching. In this paper we show how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon. We test several lexicon indexing techniques, including n‐grams and permuted lexicons, and several string matching techniques, including string similarity measures and phonetic coding. We propose methods for combining these techniques, and show experimentally that these combinations … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
68
0

Year Published

1997
1997
2017
2017

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 101 publications
(69 citation statements)
references
References 15 publications
1
68
0
Order By: Relevance
“…Character n-grams have been successfully used for a long time in a wide variety of text processing problems and domains, including the following: approximate word matching (Zobel and Dart, 1995;Mustafa, 2005), string-similarity measures (Angell et al, 1983), language identification (Gottron and Lipka, 2010;Gökçay and Gökçay, 1995), authorship attribution (Kešelj et al, 2003), text compression (Wisniewski, 1987), and bioinformatics (Pavlović-Laetić et al, 2009;Cheng and Carbonell, 2007;Tomović et al, 2006).…”
Section: The N-gram Based Approachmentioning
confidence: 99%
“…Character n-grams have been successfully used for a long time in a wide variety of text processing problems and domains, including the following: approximate word matching (Zobel and Dart, 1995;Mustafa, 2005), string-similarity measures (Angell et al, 1983), language identification (Gottron and Lipka, 2010;Gökçay and Gökçay, 1995), authorship attribution (Kešelj et al, 2003), text compression (Wisniewski, 1987), and bioinformatics (Pavlović-Laetić et al, 2009;Cheng and Carbonell, 2007;Tomović et al, 2006).…”
Section: The N-gram Based Approachmentioning
confidence: 99%
“…This approach uses n-grams to discover words that match and "nearly" match target terms, then add these additional terms to the original query. This approach has wide appeal since it could be largely language independent and could be applied to various concept (word) representations such as phonemes, soundex codes [14,13] or for spelling correction [12], using differing retrieval engines [2], or as a means to summarize the content of a document [3].…”
Section: N-grams Based Query Term Expansionmentioning
confidence: 99%
“…The principle measure used for filtering candidate terms was the edit distance, or number of single insertions, deletions, or additions needed to make one string the same as another. Further complexity may be added to this measure by applying a "cost" of additional operations, such as character transposition, or special substitutions, as with common OCI~ errors [13].…”
Section: Determining Nearness Of Matchmentioning
confidence: 99%
“…N-gram matching has been reported to be an effective technique among various approximate matching techniques in name searching (Pfeifer et al, 1996;Zobel and Dart, 1995) and cross-lingual spelling variant matching and is an appropriate fuzzy matching technique for use with TRT.…”
Section: N-gram Matchingmentioning
confidence: 99%