Proceedings of the Sixth Workshop On 2019
DOI: 10.18653/v1/w19-1402
|View full text |Cite
|
Sign up to set email alerts
|

Improving Cuneiform Language Identification with

Abstract: We describe the systems developed by the National Research Council Canada for the Cuneiform Language Identification (CLI) shared task at the 2019 VarDial evaluation campaign. We compare a state-of-the-art baseline relying on character n-grams and a traditional statistical classifier, a voting ensemble of classifiers, and a deep learning approach using a Transformer network. We describe how these systems were trained, and analyze the impact of some preprocessing and model estimation decisions. The deep neural n… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0
2

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(9 citation statements)
references
References 19 publications
0
7
0
2
Order By: Relevance
“…We used the same LM adaptation scheme as presented in this paper with the HeLI 2.0 method as well as with a custom NB implementation (Jauhiainen, Jauhiainen, and Lindén 2019), two teams used such a scheme with SVMs (Benites, von Däniken, and Cieliebak 2019; Wu et al . 2019) and one learned new information from the test set with deep neural networks (Bernier-Colborne, Goutte, and Léger 2019). All three shared tasks e concentrating on language, dialect, or variety identification were won using one of these systems.…”
Section: Discussionmentioning
confidence: 99%
“…We used the same LM adaptation scheme as presented in this paper with the HeLI 2.0 method as well as with a custom NB implementation (Jauhiainen, Jauhiainen, and Lindén 2019), two teams used such a scheme with SVMs (Benites, von Däniken, and Cieliebak 2019; Wu et al . 2019) and one learned new information from the test set with deep neural networks (Bernier-Colborne, Goutte, and Léger 2019). All three shared tasks e concentrating on language, dialect, or variety identification were won using one of these systems.…”
Section: Discussionmentioning
confidence: 99%
“…The results by Bernier-Colborne et al . (2019) in the CLI shared task seem to indicate that recent developments in contextual embedding representations may also yield performance improvement in language identification applied to similar languages, varieties, and dialects.…”
Section: Applicationsmentioning
confidence: 99%
“…In terms of computational methods, the bulk of research on this topic and the systems submitted to the DSL shared tasks at VarDial have shown that traditional machine learning classifiers such as support vector machines (Cortes and Vapnik 1995) tend to outperform dense neural network approaches for similar languages and language varieties (Bestgen 2017; Medvedeva, Kroon, and Plank 2017). The best system (Bernier-Colborne, Goutte, and Léger 2019) submitted to the VarDial 2019 Cuneiform Language Identification (CLI) shared task (Zampieri et al . 2019), however, has outperformed traditional machine learning methods using a BERT-based (Devlin et al .…”
Section: Applicationsmentioning
confidence: 99%
“…Hän oli kirjoittanut tutkimustyönsä pohjalta pro gradu -tutkielman "Tekstin kielen automaattinen tunnistaminen" [15]. Tutkielma oli syntynyt osana Suomen Kansalliskirjaston verkkojulkaisujen keräämisen, eli verkkoharavoinnin, 5 kehitystyötä. Tutkielman ohella hän oli rakentanut kielentunnistimen, joka kykeni erottelemaan tekstin kielen 103 kielen joukosta.…”
Section: Projektin Suunnitteluunclassified
“…Kilpailu osoittautui erittäin haastavaksi, toisaalta epätavallisen testiasetelman ja toisaalta suurten harjoituskorpusten osalta. Ainoastaan Kanadan kansallisen tutkimuskeskuksen tutkimusryhmä onnistui tuottamaan tuloksia kilpailuun käyttäen BERT pohjaisia syväoppivia menetelmiä [9], joilla sama tutkimusryhmä oli voittanut 2019 järjestetyn CLI kilpailun [5,43]. Heidän tuloksensa jäivät kuitenkin merkittävästi SUKI-projektin tuottamista vertailutuloksista [4].…”
Section: Uudempia Tutkimuksia Ja Tuotoksiaunclassified