2017
DOI: 10.1007/978-3-319-53676-7_6
|View full text |Cite
|
Sign up to set email alerts
|

The WDC Gold Standards for Product Feature Extraction and Product Matching

Abstract: Finding out which e-shops offer a specific product is a central challenge for building integrated product catalogs and comparison shopping portals. Determining whether two offers refer to the same product involves extracting a set of features (product attributes) from the web pages containing the offers and comparing these features using a matching function. The existing gold standards for product matching have two shortcomings: (i) they only contain offers from a small number of e-shops and thus do not proper… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 12 publications
0
8
0
Order By: Relevance
“…Baselines. As baselines for the WDC dataset, we repeat TF-IDF cosine similarity and Paragrph2Vec experiments presented in [40], additionally we learn a decision tree and a random forest as baselines. The first baseline, considers pair-wise matching of product descriptions for which TF-IDF vectors are calculated using the bag-of-word feature extraction method.…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…Baselines. As baselines for the WDC dataset, we repeat TF-IDF cosine similarity and Paragrph2Vec experiments presented in [40], additionally we learn a decision tree and a random forest as baselines. The first baseline, considers pair-wise matching of product descriptions for which TF-IDF vectors are calculated using the bag-of-word feature extraction method.…”
Section: Methodsmentioning
confidence: 99%
“…Table 5 gives an overview of the matching results on the WDC Product Matching Gold Standard dataset. As baselines, we take TF-IDF cosine similarity and Para-grph2Vec experiments presented in [40], and decision tree and random forest explained above. Moreover, we compare results from: (i) handwritten matching rules;, (ii) the GenLink algorithm, (iii) GenLinkGL, (iv) Gen-LinkSA and (v) GenLinkComb.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations