2022
DOI: 10.1017/s1351324922000225
|View full text |Cite|
|
Sign up to set email alerts
|

Ad astra or astray: Exploring linguistic knowledge of multilingual BERT through NLI task

Abstract: Recent research has reported that standard fine-tuning approaches can be unstable due to being prone to various sources of randomness, including but not limited to weight initialization, training data order, and hardware. Such brittleness can lead to different evaluation results, prediction confidences, and generalization inconsistency of the same models independently fine-tuned under the same experimental setup. Our paper explores this problem in natural language inference, a common task in benchmarking pract… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 77 publications
0
1
0
Order By: Relevance
“…Thus, it is necessary to investigate the mechanisms by which these models ‘comprehend’ ancient Chinese. In fact, previous work has explored how pre-trained language models ‘learn’ linguistic knowledge, including probing lexical [6,7], syntactic [8,9] and semantic [10–12] knowledge encoded in the models. However, most of these works focus on probing the knowledge at certain aspects, without adopting a holistic perspective to study how the models simulate human language, or to discern potential patterns in how they organize the elements of ancient Chinese.…”
Section: Introductionmentioning
confidence: 99%
“…Thus, it is necessary to investigate the mechanisms by which these models ‘comprehend’ ancient Chinese. In fact, previous work has explored how pre-trained language models ‘learn’ linguistic knowledge, including probing lexical [6,7], syntactic [8,9] and semantic [10–12] knowledge encoded in the models. However, most of these works focus on probing the knowledge at certain aspects, without adopting a holistic perspective to study how the models simulate human language, or to discern potential patterns in how they organize the elements of ancient Chinese.…”
Section: Introductionmentioning
confidence: 99%