1998
DOI: 10.1109/34.682181
|View full text |Cite
|
Sign up to set email alerts
|

Learning string-edit distance

Abstract: In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows us to learn a string edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the difficult problem of l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
360
0
3

Year Published

1999
1999
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 641 publications
(363 citation statements)
references
References 23 publications
0
360
0
3
Order By: Relevance
“…We also identified the diameter of variation sets, which we quantified in terms of normalized Levenshtein distance (Ristad & Yianilos, 1998). To allow comparison of edit distance across sentences of different lengths, we normalized it by dividing the raw edit distance by the length of the longest of the two sequences, which brings it into the range between 0 and 1.…”
Section: Variation Sets In Childesmentioning
confidence: 99%
“…We also identified the diameter of variation sets, which we quantified in terms of normalized Levenshtein distance (Ristad & Yianilos, 1998). To allow comparison of edit distance across sentences of different lengths, we normalized it by dividing the raw edit distance by the length of the longest of the two sequences, which brings it into the range between 0 and 1.…”
Section: Variation Sets In Childesmentioning
confidence: 99%
“…Some recent work tried to overcome the previously mentioned drawbacks by automatically learning the primitive edit costs, rather than hand-tuning them for each domain. Several probabilistic models have been proposed to learn a stochastic ED in the form of stochastic transducers [9,1,8], conditional random fields (CRF) [7], or pair-Hidden Markov Models (pair-HMM) [5]. These models provide a probability distribution over the edit operations and thus over the string pairs.…”
Section: Introductionmentioning
confidence: 99%
“…The motivations that justify the learning of such a transducer are the following. First, we think that an efficient way to model a stochastic ED actually consists in viewing it as a stochastic transduction between the input X and output Y alphabets [8,9]. In other words, it means that the relation constituted by a set of (input,output ) strings can be compiled in the form of a 2-tape automaton, called a stochastic finite-state transducer.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations