2001
DOI: 10.1016/s0306-4379(01)00042-4
|View full text |Cite
|
Sign up to set email alerts
|

Learning object identification rules for information integration

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
148
0
1

Year Published

2005
2005
2011
2011

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 231 publications
(149 citation statements)
references
References 35 publications
0
148
0
1
Order By: Relevance
“…These string metrics have been developed and applied in different scientific fields like statistics, for probabilistic record linkage [6], database, for record matching [7], Artificial Intelligence, for supervised learning [8], and Biology, for identifying common molecular subsequences [9]. In the current paper we have considered the Levenstein [10] distance, which counts the insertions and deletions needed to match two strings, the Needleman-Wunsch [11] distance, which assigns a different cost on the edit operations, the Smith-Waterman [9], which additionally uses an alphabet mapping to costs and the Monge-Elkan [7], which uses variable costs depending on the substring gaps between the words.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…These string metrics have been developed and applied in different scientific fields like statistics, for probabilistic record linkage [6], database, for record matching [7], Artificial Intelligence, for supervised learning [8], and Biology, for identifying common molecular subsequences [9]. In the current paper we have considered the Levenstein [10] distance, which counts the insertions and deletions needed to match two strings, the Needleman-Wunsch [11] distance, which assigns a different cost on the edit operations, the Smith-Waterman [9], which additionally uses an alphabet mapping to costs and the Monge-Elkan [7], which uses variable costs depending on the substring gaps between the words.…”
Section: Related Workmentioning
confidence: 99%
“…Thus we could not resist but to evaluate it with classical benchmarks found in literature like the ones in [7,8,24,20]. The list of the datasets used can be found in Table 2 as well as the number of strings that each dataset includes.…”
Section: Census and Field Matchingmentioning
confidence: 99%
“…BN [38] MOMA [55] SERF [5] Active Atlas [53,54] MARLIN [11,12] Multiple Classifier System [62] Operator Trees [13] TAILOR [24] FEBRL [18,17] STEM [36] Context Based Framework [16] Training-based between two entities. The previously proposed approaches mostly assume that corresponding attributes from the input datasets have been determined beforehand, either manually or with the help of schema matching.…”
Section: Matchersmentioning
confidence: 99%
“…The use of supervised (training-based) approaches or learners aims at automating the process of entity matching to reduce the required manual effort. Training-based approaches, e.g., Naïve Bayes [49], logistic regression [46], Support Vector Machine (SVM) [11,43,49] or decision trees [63,29,49,53,54,56] have so far been used for some subtasks, e.g., determining suitable parameterizations for matchers or adjusting combination functions parameters (weights for matchers, offsets). However, training-based approaches require suitable training data and providing such data typically involves manual effort.…”
Section: Combination Of Matchersmentioning
confidence: 99%
See 1 more Smart Citation