Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.286
|View full text |Cite
|
Sign up to set email alerts
|

Wikipedia Entities as Rendezvous across Languages: Grounding Multilingual Language Models by Predicting Wikipedia Hyperlinks

Abstract: Masked language models have quickly become the de facto standard when processing text. Recently, several approaches have been proposed to further enrich word representations with external knowledge sources such as knowledge graphs. However, these models are devised and evaluated in a monolingual setting only. In this work, we propose a languageindependent entity prediction task as an intermediate training procedure to ground word representations on entity semantics and bridge the gap across different languages… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(11 citation statements)
references
References 33 publications
0
11
0
Order By: Relevance
“…Our experiments have demonstrated that entity supervision in EASE improves the quality of sentence embeddings both in the monolingual setting and, in particular, the multilingual setting. As recent studies have shown, entity annotations can be used as anchors to learn quality cross-lingual representations (Calixto et al, 2021;Nishikawa et al, 2021;Jian et al, 2022;Ri et al, 2022), and our work is another demonstration of their utility, particularly in sentence embeddings. One promising future direction is exploring how to better exploit the cross-lingual nature of entities.…”
Section: Discussionmentioning
confidence: 65%
See 1 more Smart Citation
“…Our experiments have demonstrated that entity supervision in EASE improves the quality of sentence embeddings both in the monolingual setting and, in particular, the multilingual setting. As recent studies have shown, entity annotations can be used as anchors to learn quality cross-lingual representations (Calixto et al, 2021;Nishikawa et al, 2021;Jian et al, 2022;Ri et al, 2022), and our work is another demonstration of their utility, particularly in sentence embeddings. One promising future direction is exploring how to better exploit the cross-lingual nature of entities.…”
Section: Discussionmentioning
confidence: 65%
“…thus offer a useful cross-lingual alignment supervision (Calixto et al, 2021;Nishikawa et al, 2021;Jian et al, 2022;Ri et al, 2022). The extensive multilingual support of Wikipedia alleviates the need for a parallel resource to train well-aligned multilingual sentence embeddings, especially for low-resource languages.…”
Section: Introductionmentioning
confidence: 99%
“…While these works use hyperlinks to learn retrievers, we focus on using hyperlinks to create better context for learning general-purpose LMs. Separately, Calixto et al (2021) use Wikipedia hyperlinks to learn multilingual LMs. Citation links are often used to improve summarization and recommendation of academic papers (Qazvinian and Radev, 2008;Yasunaga et al, 2019;Bhagavatula et al, 2018;Khadka et al, 2020;Cohan et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…initialization) of Prix-LM. 15 With the notable exception of Calixto et al (2021) who rely on the prediction of Wikipedia hyperlinks as an auxiliary/intermediate task to improve XLM-R's multilingual representation space for cross-lingual transfer, there has not been any work on augmenting multilingual PLMs with structured knowledge. Previous work has indicated that off-the-shelf mBERT and XLM-R fail on knowledge-intensive multilingual NLP tasks such as entity linking and KG completion, and especially so for low-resource languages (Liu et al, 2021b).…”
Section: Related Workmentioning
confidence: 99%