Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing 2016
DOI: 10.18653/v1/d16-1102
|View full text |Cite
|
Sign up to set email alerts
|

Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages

Abstract: Annotation projection based on parallel corpora has shown great promise in inexpensively creating Proposition Banks for languages for which high-quality parallel corpora and syntactic parsers are available. In this paper, we present an experimental study where we apply this approach to three languages that lack such resources: Tamil, Bengali and Malayalam. We find an average quality difference of 6 to 20 absolute F-measure points vis-avis high-resource languages, which indicates that annotation projection alon… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 14 publications
(11 citation statements)
references
References 9 publications
0
11
0
Order By: Relevance
“…Instead of dealing with heterogeneous linguistic theories, another line of research consists in actively studying the effect of using a single formalism across multiple languages through annotation projection or other transfer techniques (Akbik et al, 2015(Akbik et al, , 2016Daza and Frank, 2019;Cai and Lapata, 2020;Daza and Frank, 2020). However, such approaches often rely on word aligners and/or automatic translation tools which may introduce a considerable amount of noise, especially in lowresource languages.…”
Section: Introductionmentioning
confidence: 99%
“…Instead of dealing with heterogeneous linguistic theories, another line of research consists in actively studying the effect of using a single formalism across multiple languages through annotation projection or other transfer techniques (Akbik et al, 2015(Akbik et al, , 2016Daza and Frank, 2019;Cai and Lapata, 2020;Daza and Frank, 2020). However, such approaches often rely on word aligners and/or automatic translation tools which may introduce a considerable amount of noise, especially in lowresource languages.…”
Section: Introductionmentioning
confidence: 99%
“…Symbolic SRT schemes such as SRL schemes and AMR have also been studied for their crosslinguistic applicability (Padó and Lapata, 2009;Sun et al, 2010;Xue et al, 2014), indicating partial portability across languages. Translated versions of PropBank and FrameNet have been constructed for multiple languages (e.g., Akbik et al, 2016;Hartmann and Gurevych, 2013). How-ever, as both PropBank and FrameNet are lexicalized schemes, and as lexicons diverge wildly across languages, these schemes require considerable adaptation when ported across languages (Kozhevnikov and Titov, 2013).…”
Section: Discussionmentioning
confidence: 99%
“…Creating SRL datasets requires expert annotation, which is expensive. While there are some efforts on semi-automatic annotation targeting low-resource languages (e.g., Akbik et al, 2016), achieving high neural network performance with small or unlabeled datasets remains a challenge (e.g., Lapata, 2009, 2012;Titov and Klementiev, 2012;Gormley et al, 2014;Abend et al, 2009).…”
Section: Scenario 1: Low Training Datamentioning
confidence: 99%