Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &Amp; Data Mining 2021
DOI: 10.1145/3447548.3467196
|View full text |Cite
|
Sign up to set email alerts
|

Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition

Abstract: Named entity recognition (NER) is a fundamental component in many applications, such as Web Search and Voice Assistants. Although deep neural networks greatly improve the performance of NER, due to the requirement of large amounts of training data, deep neural networks can hardly scale out to many languages in an industry setting. To tackle this challenge, cross-lingual NER transfers knowledge from a rich-resource language to languages with low resources through pre-trained multilingual language models. Instea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
23
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 21 publications
(23 citation statements)
references
References 36 publications
0
23
0
Order By: Relevance
“…(2) Feature-based approaches rely on features alignment to diminish the language shift (Chen et al 2021;Ge et al 2023). (3) The distillation-based methods (Wu et al 2020b;Liang et al 2021;Ma et al 2022) enable the student network to gain task knowledge from soft labels predicted by the teacher network on the target language. However, these studies are designed on the single-source assumption and fail to deal with multiple source languages.…”
Section: Related Workmentioning
confidence: 99%
“…(2) Feature-based approaches rely on features alignment to diminish the language shift (Chen et al 2021;Ge et al 2023). (3) The distillation-based methods (Wu et al 2020b;Liang et al 2021;Ma et al 2022) enable the student network to gain task knowledge from soft labels predicted by the teacher network on the target language. However, these studies are designed on the single-source assumption and fail to deal with multiple source languages.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, there is a growing body of literature that recognizes the importance of cross-lingual sequence labeling tasks (xSL for short) (Huang et al, 2019;Lewis et al, 2020;Artetxe et al, 2019a), where translation systems are utilized to translate high-resource languages into low-resource languages so as to enrich the training data. However, the performance of xSL models is severely affected by translation quality (Yuan et al, 2020;Liang et al, 2021a).…”
Section: [Sep] [Cls]mentioning
confidence: 99%
“…However, when pushing the boundary of SL to low-resource languages, xSL can be very challenging, due to limited training data. To tackle this challenge, various efforts have been proposed (Huang et al, 2019;Liang et al, 2021a). Cui et al (2019) and Singh et al (2019) utilized machine translation to get parallel data as the data argumentation method.…”
Section: Related Workmentioning
confidence: 99%
“…Following [34], our actions are sampled in batch, and obtain the delayed reward after two GNNs being updated according to a batch of sequential actions. Similar to [21], we utilize the performance of the models after being updated as the reward. We use the negative value of the cross entropy loss to measure the performance of the models as in [21,42], defined as:…”
Section: Optimizationsmentioning
confidence: 99%
“…Similar to [21], we utilize the performance of the models after being updated as the reward. We use the negative value of the cross entropy loss to measure the performance of the models as in [21,42], defined as:…”
Section: Optimizationsmentioning
confidence: 99%