Findings of the Association for Computational Linguistics: ACL 2022 2022
DOI: 10.18653/v1/2022.findings-acl.221
|View full text |Cite
|
Sign up to set email alerts
|

EnCBP: A New Benchmark Dataset for Finer-Grained Cultural Background Prediction in English

Abstract: While cultural backgrounds have been shown to affect linguistic expressions, existing natural language processing (NLP) research on culture modeling is overly coarse-grained and does not examine cultural differences among speakers of the same language. To address this problem and augment NLP models with cultural background features, we collect, annotate, manually validate, and benchmark EnCBP, a finergrained news-based cultural background prediction dataset in English. Through language modeling (LM) evaluation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(10 citation statements)
references
References 16 publications
0
10
0
Order By: Relevance
“…Specifically, following existing research (Mora, 2013;Tomlinson et al, 2014;Hershcovich et al, 2022), we define culture as the combination of human beliefs, norms, and customs among groups. Previous work in natural language processing (NLP) has primarily focused on cultural investigation of models (Hutchinson et al, 2020;Ross et al, 2021;Ma et al, 2022), with little emphasis on dialogue agents. Besides, probing is a popular way to study the characteristics of models or agents (Hämmerl et al, 2022;Arora et al, 2022;.…”
Section: Discussionmentioning
confidence: 99%
“…Specifically, following existing research (Mora, 2013;Tomlinson et al, 2014;Hershcovich et al, 2022), we define culture as the combination of human beliefs, norms, and customs among groups. Previous work in natural language processing (NLP) has primarily focused on cultural investigation of models (Hutchinson et al, 2020;Ross et al, 2021;Ma et al, 2022), with little emphasis on dialogue agents. Besides, probing is a popular way to study the characteristics of models or agents (Hämmerl et al, 2022;Arora et al, 2022;.…”
Section: Discussionmentioning
confidence: 99%
“…They investigate the regional differences at the performance level, while this paper tries to illustrate bias at the intrinsic level, i.e., differences in embedding space. Nevertheless, research by Ma et al (2022) points to bias in resourceabundant regions. This paper strives to diminish dependence on labeled task data by examining regional bias at the word embedding level before the fine-tuning process, potentially impacting downstream task performance.…”
Section: Studies Investigating Regional Differences In Llmsmentioning
confidence: 99%
“…Language is intertwined with culture due to differences in word usage behavior (Loveys et al, 2018), writing styles (Ma et al, 2022), common sense knowledge, debatable topics, and value systems (Hershcovich et al, 2022). Research has shown that these demographic differences in the task domain will harm the performance of downstream Natural Language Processing (NLP) tasks (Ma et al, 2022;Ghosh et al, 2021;Sun et al, 2021;González et al, 2020;Tan et al, 2020;Loveys et al, 2018).…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations