2021
DOI: 10.48550/arxiv.2105.14403
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Re-evaluating Word Mover's Distance

Abstract: The word mover's distance (WMD) is a fundamental technique for measuring the similarity of two documents. As the crux of WMD, it can take advantage of the underlying geometry of the word space by employing an optimal transport formulation. The original study on WMD reported that WMD outperforms classical baselines such as bag-of-words (BOW) and TF-IDF by significant margins in various datasets. In this paper, we point out that the evaluation in the original study could be misleading. We re-evaluate the perform… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 49 publications
0
7
0
Order By: Relevance
“…(Wang et al, 2020) replace assumption that documents' BOWs have the same measure to solve Kantorovich problem of optimal transport with the usage of Wasserstein-Fisher-Rao distance between documents based on unbalanced optimal transport principles. The work of (Sato et al, 2021) is especially significant for further discussion. The authors re-evaluate the performances of WMD and the classical baselines and find that once the data gets L1 or L2 normalization, the performance of other classical semantic similarity measures becomes comparable with WMD.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…(Wang et al, 2020) replace assumption that documents' BOWs have the same measure to solve Kantorovich problem of optimal transport with the usage of Wasserstein-Fisher-Rao distance between documents based on unbalanced optimal transport principles. The work of (Sato et al, 2021) is especially significant for further discussion. The authors re-evaluate the performances of WMD and the classical baselines and find that once the data gets L1 or L2 normalization, the performance of other classical semantic similarity measures becomes comparable with WMD.…”
Section: Related Workmentioning
confidence: 99%
“…To assure better reproducibility, we work with the datasets presented in (Kusner et al, 2015) and (Sato et al, 2021) 1 . For the evaluation, we use six datasets that we believe to be diverse and illustrative enough for the aims of this discussion.…”
Section: Datasetsmentioning
confidence: 99%
See 2 more Smart Citations
“…In 2021, researchers such as Ryoma Sato conducted a more in-depth study of WMD and found that the evaluation method in the original study of WMD has certain errors. The similarity calculation can be made more accurate if appropriate preprocessing, namely L1 normalization, is used [7] .…”
Section: Text Similaritymentioning
confidence: 99%