2022
DOI: 10.1162/tacl_a_00467
|View full text |Cite
|
Sign up to set email alerts
|

Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation

Abstract: Common designs of model evaluation typically focus on monolingual settings, where different models are compared according to their performance on a single data set that is assumed to be representative of all possible data for the task at hand. While this may be reasonable for a large data set, this assumption is difficult to maintain in low-resource scenarios, where artifacts of the data collection can yield data sets that are outliers, potentially making conclusions about model performance coincidental. To ad… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 72 publications
0
1
0
Order By: Relevance
“…To control for variations that could arise due to training data sub-sampling (Liu and Prud'hommeaux, 2022), we run each experiment on five disjoint subsets of the parallel training data and report the average results.…”
Section: Low Resource Setting For Parallel Datamentioning
confidence: 99%
“…To control for variations that could arise due to training data sub-sampling (Liu and Prud'hommeaux, 2022), we run each experiment on five disjoint subsets of the parallel training data and report the average results.…”
Section: Low Resource Setting For Parallel Datamentioning
confidence: 99%