Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages 2017
DOI: 10.18653/v1/w17-0109
|View full text |Cite
|
Sign up to set email alerts
|

Instant annotations in ELAN corpora of spoken and written Komi, an endangered language of the Barents Sea region

Abstract: The paper describes work-in-progress by the Izhva Komi language documentation project, which records new spoken language data, digitizes available recordings and annotate these multimedia data in order to provide a comprehensive language corpus as a databases for future research on and for this endangered -and under-described -Uralic speech community. While working with a spoken variety and in the framework of documentary linguistics, we apply language technology methods and tools, which have been applied so f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 6 publications
0
9
0
Order By: Relevance
“…The Northern Saami and Erzya corpora, for example, seem to have been created using a similar approach. Some work has been conducted with integrating these NLP tools into workflows commonly used in language documentation (Gerstenberger et al, 2017a(Gerstenberger et al, ,b, 2016. Since these languages often lack larger annotated resources, the use of infrastructures other than rule-based ones has not been common or possible, but these workflows have been implemented in a modular fashion that would make enable the integration of other tools when they become available or reach needed accuracy.…”
Section: Methodsmentioning
confidence: 99%
“…The Northern Saami and Erzya corpora, for example, seem to have been created using a similar approach. Some work has been conducted with integrating these NLP tools into workflows commonly used in language documentation (Gerstenberger et al, 2017a(Gerstenberger et al, ,b, 2016. Since these languages often lack larger annotated resources, the use of infrastructures other than rule-based ones has not been common or possible, but these workflows have been implemented in a modular fashion that would make enable the integration of other tools when they become available or reach needed accuracy.…”
Section: Methodsmentioning
confidence: 99%
“…By the end of this century, many will not survive with the decreasing number of the native speakers of such languages (Riza, 2008). This has alarmed not only native speakers of these languages but also research community to direct their attention for language documentation as well as preservation and revitalization studies for these languages (Ćavar et al, 2016;Gerstenberger et al, 2017). Bird (2009) calls out for a 'new kind of computational linguistics' in his paper that would protect this endangered invaluable cultural heritage by helping to accelerate these studies, and he ends his paper with these words 'Who knows, we may even postpone the day when these languages utter their last words.'…”
Section: Introductionmentioning
confidence: 99%
“…However, with little data at hand these methods may not present a good solution. Therefore, Gerstenberger et al (2017) suggests a rulebased morpho-syntactic modelling for annotating small language data. On their study of Komi language, his results show by-far significant advantages of rule-based approaches for endangered languages by providing much more precise results in tagging as well as 'full-fledged grammatical description based on broad empirical evidence' and a future development for computer-assisted language learning systems.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Computational linguistic research on Komi is so far only in a development stage. However, an FST morphological analyzer and a (rudimentary) syntactic parser based on Constraint Grammar are available at Giellatekno/Divvun -Saami Language Technology at UiT The Arctic University of Norway¹ and work on a complete Constraint Grammar description to be implemented into a rule-based syntactic parser is currently carried out in collaboration by Giellatekno, the Izhva Komi Documentation Project Gerstenberger et al (2016Gerstenberger et al ( , 2017 and FU-Lab², which has also created a written Komi National corpus (with over 30M words), free electronic dictionaries and a Hunspell checker (including morpheme lists).…”
Section: Introductionmentioning
confidence: 99%