Optimal Transport-based Alignment of Learned Character Representations for String Similarity

Tam, Derek; Monath, Nicholas; Kobren, Ari; Traylor, Aaron; Das, Rajarshi; McCallum, Andrew

doi:10.18653/v1/p19-1592

Cited by 16 publications

(21 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, [22] proposed an architecture using a Multi-Layered Perceptron (MLP) to recognize toponyms, and similar neural network architecture is used by [27] for entity linking. Our work is similar to these two last studies, but our pair-wise ranking architecture is coupled with a strategy that allows us to leverage the disambiguation to millions of source and formal names by filtering irrelevant formal names out.…”

Section: Related Workmentioning

confidence: 99%

“…Named Entity Disambiguation (NED) [14,27] is the task of linking textual variations of Named Entities (NE) 1 to their target names, which are usually provided as a list of formal names. For instance, while recognizing "Philip Morris" as an NE is the job of a Named Entity Recognition (NER) system, associating it to "Philip Morris International Inc (PMI)" in a list of formal names as a means of disambiguation is performed via NED.…”

Section: Introductionmentioning

confidence: 99%

“…The tf-idf model has two tasks; to generate feature vectors for the deep learning model and to set a threshold for limiting the formal names when using the system at inference time. We test the system on four datasets for alias detection [27] and compare the results with several baselines as well as a state-of-the-art NED system.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Named Entity Disambiguation at Scale

Aghaebrahimian

Cieliebak

2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Named Entity Disambiguation (NED) is a crucial task in many Natural Language Processing applications such as entity linking, record linkage, knowledge base construction, or relation extraction, to name a few. The task in NED is to map textual variations of a named entity to its formal name. It has been shown that parameter-less models for NED do not generalize to other domains very well. On the other hand, parametric learning models do not scale well when the number of formal names expands above the order of thousands or more. To tackle this problem, we propose a deep architecture with superior performance on NED and introduce a strategy to scale it to hundreds of thousands of formal names. Our experiments on several datasets for alias detection demonstrate that our system is capable of obtaining superior results with a large margin compared to other state-of-the-art systems.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Named Entity Disambiguation at Scale

Aghaebrahimian

Cieliebak

2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…FACS and Fluorescence Activated Cell Sorting). A simple alias table or string similarity extension (Tam et al, 2019) would be a clear improvement. Leveraging high precision concept extraction systems (King et al, 2020) might improve clustering even more.…”

Section: Future Directionsmentioning

confidence: 99%

The impact of preprint servers in the formation of novel ideas

Satish¹,

Yao²,

Drozdov

et al. 2020

Preprint

View full text Add to dashboard Cite

We study whether novel ideas in biomedical literature appear first in preprints or traditional journals. We develop a Bayesian method to estimate the time of appearance for a phrase in the literature, and apply it to a number of phrases, both automatically extracted and suggested by experts. We see that presently most phrases appear first in the traditional journals, but there is a number of phrases with the first appearance on preprint servers. A comparison of the general composition of texts from bioRxiv and traditional journals shows a growing trend of bioRxiv being predictive of traditional journals. We discuss the application of the method for related problems.

show abstract

“…Even a highly enriched KB will not cover all possible name variations (especially of less popular entities), not to mention spelling mistakes or OCR errors. To address this issue, recent work has addressed this problem by including a noise detector to the entity linking system that operates at a tokenlevel [11], or by learning and aligning character representations for string similarity [20].…”

Section: Introduction and Related Workmentioning

confidence: 99%

A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

Ardanuy

Hosseini

McDonough

et al. 2020

Proceedings of the 28th International Conference on Advances in Geographic Information Systems

View full text Add to dashboard Cite

Recognizing toponyms and resolving them to their real-world referents is required to provide advanced semantic access to textual data. This process is often hindered by the high degree of variation in toponyms. Candidate selection is the task of identifying the potential entities that can be referred to by a previously recognized toponym. While it has traditionally received little attention, candidate selection has a significant impact on downstream tasks (i.e. entity resolution), especially in noisy or non-standard text. In this paper, we introduce a deep learning method for candidate selection through toponym matching, using state-of-the-art neural network architectures. We perform an intrinsic toponym matching evaluation based on several datasets, which cover various challenging scenarios (cross-lingual and regional variations, as well as OCR errors) and assess its performance in the context of geographical candidate selection in English and Spanish. CCS CONCEPTS • Computing methodologies → Information extraction; Natural language processing; • Information systems → Digital libraries and archives.

show abstract

Optimal Transport-based Alignment of Learned Character Representations for String Similarity

Cited by 16 publications

References 33 publications

Named Entity Disambiguation at Scale

Named Entity Disambiguation at Scale

The impact of preprint servers in the formation of novel ideas

A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

Contact Info

Product

Resources

About