Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data 2008
DOI: 10.1145/1376616.1376742
|View full text |Cite
|
Sign up to set email alerts
|

Incorporating string transformations in record matching

Abstract: Today's record matching infrastructure does not allow a flexible way to account for synonyms such as "Robert" and "Bob" which refer to the same name, and more general forms of string transformations such as abbreviations. We expand the problem of record matching to take such user-defined string transformations as input. These transformations coupled with an underlying similarity function are used to define the similarity between two strings. We demonstrate the effectiveness of this approach via a fuzzy match o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2009
2009
2019
2019

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 13 publications
(12 citation statements)
references
References 4 publications
0
12
0
Order By: Relevance
“…We ran the algorithm on all the datasets with different lp values. Figures 2(c)-2(f) show various measures on the DBLP dataset for τ ∈ [1,3]. Results on other datasets are similar.…”
Section: Effect Of Prefix Lengthmentioning
confidence: 63%
See 2 more Smart Citations
“…We ran the algorithm on all the datasets with different lp values. Figures 2(c)-2(f) show various measures on the DBLP dataset for τ ∈ [1,3]. Results on other datasets are similar.…”
Section: Effect Of Prefix Lengthmentioning
confidence: 63%
“…A recent experimental study of their relative effectiveness is presented in [9]. Some new types of record recently studied include utilizing group information [30], combining multiple similarity functions [11], leveraging aggregate constraints [13], and considering string transformation rules [3].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…We compare with them in more detail in the next subsection. The learned rules can be used to identify more coreferent pairs [18,19].…”
Section: Relatedmentioning
confidence: 99%
“…Lines 12-13 calculate the scores for each candidate rule. Then we iteratively select the TopRule which has the maximum score (Lines [16][17][18][19][20], and withdraw support from other conflicting candidate rules by the procedure UpdateRules. We repeat this process until k high-quality rules are found.…”
Section: Select the Top-k Rulesmentioning
confidence: 99%