Proceedings of the Sixth Workshop On 2019
DOI: 10.18653/v1/w19-1409
|View full text |Cite
|
Sign up to set email alerts
|

Language and Dialect Identification of Cuneiform Texts

Abstract: This article introduces a corpus of cuneiform texts from which the dataset for the use of the Cuneiform Language Identification (CLI) 2019 shared task was derived as well as some preliminary language identification experiments conducted using that corpus. We also describe the CLI dataset and how it was derived from the corpus. In addition, we provide some baseline language identification results using the CLI dataset. To the best of our knowledge, the experiments detailed here represent the first time that aut… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0
2

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
1

Relationship

2
5

Authors

Journals

citations
Cited by 25 publications
(26 citation statements)
references
References 13 publications
0
24
0
2
Order By: Relevance
“…2019) system to discriminating between Sumerian and Akkadian historical dialects in Cuneiform script (Jauhiainen et al . 2019a). BERT and other Transformer-based contextual representations have been recently applied to various NLP tasks achieving state-of-the-art results.…”
Section: Applicationsmentioning
confidence: 99%
See 1 more Smart Citation
“…2019) system to discriminating between Sumerian and Akkadian historical dialects in Cuneiform script (Jauhiainen et al . 2019a). BERT and other Transformer-based contextual representations have been recently applied to various NLP tasks achieving state-of-the-art results.…”
Section: Applicationsmentioning
confidence: 99%
“…In terms of computational methods, the bulk of research on this topic and the systems submitted to the DSL shared tasks at VarDial have shown that traditional machine learning classifiers such as support vector machines (Cortes and Vapnik 1995) tend to outperform dense neural network approaches for similar languages and language varieties (Bestgen 2017; Medvedeva, Kroon, and Plank 2017). The best system (Bernier-Colborne, Goutte, and Léger 2019) submitted to the VarDial 2019 Cuneiform Language Identification (CLI) shared task (Zampieri et al 2019), however, has outperformed traditional machine learning methods using a BERT-based (Devlin et al 2019) system to discriminating between Sumerian and Akkadian historical dialects in Cuneiform script (Jauhiainen et al 2019a). BERT and other Transformer-based contextual representations have been recently applied to various NLP tasks achieving state-of-the-art results.…”
Section: Language and Dialect Identificationmentioning
confidence: 99%
“…Työpajan ohessa luotiin yhteyksiä Tromssan yliopiston Giellatekno -tutkimusryhmään ja tutustuttiin heidän toimintaansa. 17 Myöhemmin samana vuonna T. Jauhiainen kutsuttiin esittelemään projektia Viron kansalliskirjastossa järjestettyyn "Web Archiving: Preserving the History of Data-Driven Society" -tapahtumaan, 18 Tallinnaan.…”
Section: Verkkoharavointiaunclassified
“…Vuonna 2019 he olivat myös itse järjestäneet yhden kilpailuista. Kilpailu, "Cuneiform Language Identification" (CLI), keskittyi nuolenpäillä kirjoitettujen akkadin-ja sumerinkielisten tekstien murteiden tunnistamiseen [17]. Vuoden 2020 kilpailu nimettiin "Uralic Language Identification" (ULI) ja se keskittyi erityisesti harvinaisten uralilaisten kielten tekstipätkien tunnistamiseen isosta määrästä tekstiä, joka oli kirjoitettu enimmäkseen muilla kuin uralilaisilla kielillä [25].…”
Section: Uudempia Tutkimuksia Ja Tuotoksiaunclassified
“…The baseline methods and their results included in the table are described by Jauhiainen et al (2019a). The NRC-CNRC team submitted three runs.…”
Section: Participants and Approachesmentioning
confidence: 99%