Proceedings of the Second Workshop on Insights From Negative Results in NLP 2021
DOI: 10.18653/v1/2021.insights-1.6
|View full text |Cite
|
Sign up to set email alerts
|

Are BERTs Sensitive to Native Interference in L2 Production?

Abstract: With the essays part from The International Corpus Network of Asian Learners of English (ICNALE) and the TOEFL11 corpus, we finetuned neural language models based on BERT to predict English learners' native languages. Results showed neural models can learn to represent and detect such native language impacts, but multilingually trained models have no advantage in doing so.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 19 publications
(26 reference statements)
0
1
0
Order By: Relevance
“…As mentioned in Section 3, we were initially inspired by the fact that adding more training data did not seem to improve classification performance. In addition, earlier work indicated that classifying the country of origin of an author based on their English text provides good results, with Tang et al (2021) reporting an accuracy of 87% on all of IC-NALE for this task. We argue that this points at signals of L1 in the English learner texts that a classifier can pick up on, and that consequently, finding a way to make input text more homogeneous to a classifier through debiasing (Section 4.1) can lead to CEFR classification performance gains (Section 4.2).…”
Section: Methods and Resultsmentioning
confidence: 88%
“…As mentioned in Section 3, we were initially inspired by the fact that adding more training data did not seem to improve classification performance. In addition, earlier work indicated that classifying the country of origin of an author based on their English text provides good results, with Tang et al (2021) reporting an accuracy of 87% on all of IC-NALE for this task. We argue that this points at signals of L1 in the English learner texts that a classifier can pick up on, and that consequently, finding a way to make input text more homogeneous to a classifier through debiasing (Section 4.1) can lead to CEFR classification performance gains (Section 4.2).…”
Section: Methods and Resultsmentioning
confidence: 88%