Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Confere 2015
DOI: 10.3115/v1/p15-2044
|View full text |Cite
|
Sign up to set email alerts
|

If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages

Abstract: We present a simple method for learning part-of-speech taggers for languages like Akawaio, Aukan, or Cakchiquel -languages for which nothing but a translation of parts of the Bible exists. By aggregating over the tags from a few annotated languages and spreading them via wordalignment on the verses, we learn POS taggers for 100 languages, using the languages to bootstrap each other. We evaluate our cross-lingual models on the 25 languages where test sets exist, as well as on another 10 for which we have tag di… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
52
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 42 publications
(52 citation statements)
references
References 9 publications
0
52
0
Order By: Relevance
“…In future work, we plan to investigate other methods for seed pairs selection, settings with scarce resources (Agić et al, 2015;Zhang et al, 2016), other context types inspired by recent work in the monolingual settings (Levy and Goldberg, 2014a;Melamud et al, 2016), as well as model adaptations that can work with multi-word expressions. Encouraged by the excellent results, we also plan to test the portability of the approach to more language pairs, and other tasks and applications.…”
Section: Discussionmentioning
confidence: 99%
“…In future work, we plan to investigate other methods for seed pairs selection, settings with scarce resources (Agić et al, 2015;Zhang et al, 2016), other context types inspired by recent work in the monolingual settings (Levy and Goldberg, 2014a;Melamud et al, 2016), as well as model adaptations that can work with multi-word expressions. Encouraged by the excellent results, we also plan to test the portability of the approach to more language pairs, and other tasks and applications.…”
Section: Discussionmentioning
confidence: 99%
“…Täckström et al (2013a) further improve this to 89% by leveraging Wiktionary. For some languages, there are even less resources available; Agić et al (2015b) were able to reach accuracies around 70% by using partial or full Bible translation. Our methods could thus be applied even in a more realistic scenario, where gold POS tags are not available for the target text, by using a weaklysupervised POS tagger.…”
Section: Related Workmentioning
confidence: 99%
“…A new tagger is then trained on the target side, with some smoothing to reduce the noise caused by alignment errors. Follow-up work has focused on the inclusion of several source languages (Fossum and Abney, 2005), more accurate projection algorithms (Das and Petrov, 2011;Duong et al, 2013), the integration of external lexicon sources (Li et al, 2012;Täckström et al, 2013), the extension from part-of-speech tagging to full morphological tagging (Buys and Botha, 2016), and the investigation of truly low-resource settings by resorting to Bible translations (Agić et al, 2015). A related approach (Aepli et al, 2014) uses majority voting to disambiguate tags proposed by several source languages.…”
Section: Related Workmentioning
confidence: 99%