TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval

Leszczynski, Megan; Fu, Daniel; Chen, Mayee; Ré, Christopher

doi:10.18653/v1/2022.findings-acl.169

Cited by 5 publications

(2 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Architecture We use Dual-Encoder architecture to model context-type pairs similar to entity linking Zhang et al, 2022a,b;Leszczynski et al, 2022). As shown in Figure 2 (a) and (b), Dual-Encoder consists of two independent transformer encoders, called Context-Encoder E ctxt and Type-Encoder E type .…”

Section: Context-type Semantic Alignmentmentioning

confidence: 99%

Alignment Precedes Fusion: Open-Vocabulary Named Entity Recognition as Context-Type Semantic Matching

Jin,

Cao,

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Despite the significant progress in developing named entity recognition models, scaling to novel-emerging types still remains challenging in real-world scenarios. Continual learning and zero-shot learning approaches have been explored to handle novel-emerging types with less human supervision, but they have not been as successfully adopted as supervised approaches. Meanwhile, humans possess a much larger vocabulary size than these approaches and have the ability to learn the alignment between entities and concepts effortlessly through natural supervision. In this paper, we consider a more realistic and challenging setting called openvocabulary named entity recognition (OVNER) to imitate human-level ability. OVNER aims to recognize entities in novel types by their textual names or descriptions. Specifically, we formulate OVNER as a semantic matching task and propose a novel and scalable two-stage method called Context-Type SemAntiC Alignment and FusiOn (CACAO). In the pre-training stage, we adopt Dual-Encoder for context-type semantic alignment and pre-train Dual-Encoder on 80M context-type pairs which are easily accessible through natural supervision. In the fine-tuning stage, we use Cross-Encoder for context-type semantic fusion and fine-tune Cross-Encoder on base types with human supervision. Experimental results show that our method outperforms the previous state-of-the-art methods on three challenging OVNER benchmarks by 9.7%, 9.5%, and 1.8% F1-score of novel types. Moreover, CACAO also demonstrates its flexible transfer ability in cross-domain NER. 1

show abstract

Section: Context-type Semantic Alignmentmentioning

confidence: 99%

Alignment Precedes Fusion: Open-Vocabulary Named Entity Recognition as Context-Type Semantic Matching

Jin,

Cao,

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

show abstract

“…Wikidata Type system. Prior work demonstrated that types can benefit EL systems (Ling et al, 2015;Raiman and Raiman, 2018;Leszczynski et al, 2022). We introduce a new formulation for coarse and fine entity typing, utilizing rich structural knowledge in Wikidata.…”

Section: Annotate Entitiesmentioning

confidence: 99%

Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

Xu¹,

Shan²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL datasets. It covers 10K diverse documents in news, social media posts and other web articles, with Wikidata as its target Knowledge Base. We demonstrate that the existing state-of-theart EL system performs poorly on Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We also show that our baseline achieves competitive results on TAC-KBP2015 Chinese Entity Linking task. The dataset is available at https://github. com/imryanxu/Hansel.

show abstract

Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

Shan

et al. 2023

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

View full text Add to dashboard Cite

TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval

Cited by 5 publications

References 17 publications

Alignment Precedes Fusion: Open-Vocabulary Named Entity Recognition as Context-Type Semantic Matching

Alignment Precedes Fusion: Open-Vocabulary Named Entity Recognition as Context-Type Semantic Matching

Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

Contact Info

Product

Resources

About