Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1592
|View full text |Cite
|
Sign up to set email alerts
|

Optimal Transport-based Alignment of Learned Character Representations for String Similarity

Abstract: String similarity models are vital for record linkage, entity resolution, and search. In this work, we present STANCE-a learned model for computing the similarity of two strings. Our approach encodes the characters of each string, aligns the encodings using Sinkhorn Iteration (alignment is posed as an instance of optimal transport) and scores the alignment with a convolutional neural network. We evaluate STANCE's ability to detect whether two strings can refer to the same entity-a task we term alias detection.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
21
0

Year Published

2020
2020
2025
2025

Publication Types

Select...
3
1
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 16 publications
(21 citation statements)
references
References 33 publications
0
21
0
Order By: Relevance
“…Finally, [22] proposed an architecture using a Multi-Layered Perceptron (MLP) to recognize toponyms, and similar neural network architecture is used by [27] for entity linking. Our work is similar to these two last studies, but our pair-wise ranking architecture is coupled with a strategy that allows us to leverage the disambiguation to millions of source and formal names by filtering irrelevant formal names out.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Finally, [22] proposed an architecture using a Multi-Layered Perceptron (MLP) to recognize toponyms, and similar neural network architecture is used by [27] for entity linking. Our work is similar to these two last studies, but our pair-wise ranking architecture is coupled with a strategy that allows us to leverage the disambiguation to millions of source and formal names by filtering irrelevant formal names out.…”
Section: Related Workmentioning
confidence: 99%
“…Named Entity Disambiguation (NED) [14,27] is the task of linking textual variations of Named Entities (NE) 1 to their target names, which are usually provided as a list of formal names. For instance, while recognizing "Philip Morris" as an NE is the job of a Named Entity Recognition (NER) system, associating it to "Philip Morris International Inc (PMI)" in a list of formal names as a means of disambiguation is performed via NED.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…FACS and Fluorescence Activated Cell Sorting). A simple alias table or string similarity extension (Tam et al, 2019) would be a clear improvement. Leveraging high precision concept extraction systems (King et al, 2020) might improve clustering even more.…”
Section: Future Directionsmentioning
confidence: 99%
“…Even a highly enriched KB will not cover all possible name variations (especially of less popular entities), not to mention spelling mistakes or OCR errors. To address this issue, recent work has addressed this problem by including a noise detector to the entity linking system that operates at a tokenlevel [11], or by learning and aligning character representations for string similarity [20].…”
Section: Introduction and Related Workmentioning
confidence: 99%