Discriminating Between Similar Nordic Languages

Haas, René; Derczynski, Leon

doi:10.48550/arxiv.2012.06431

Search citation statements

Order By: Relevance

Paper Sections

Select...

Error Analysis1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2021

Publication Types

Select...

Other1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

(1 citation statement)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The gap from acc@1 to acc@3 is much larger for langid.py and FastText, illustrating a higher confusion. Recent work in language identification suggests that the accuracy gap might be a symptom of confusion of related languages (Haas and Derczynski, 2020).…”

Section: Error Analysismentioning

confidence: 99%

A reproduction of Apple's bi-directional LSTM models for language identification in short strings

Toftrup¹,

Sørensen²,

Ciosici³

et al. 2021

Preprint

View full text Add to dashboard Cite

Language Identification is the task of identifying a document's language. For applications like automatic spell checker selection, language identification must use very short strings such as text message fragments. In this work, we reproduce a language identification architecture that Apple briefly sketched in a blog post. We confirm the bi-LSTM model's performance and find that it outperforms current open-source language identifiers. We further find that its language identification mistakes are due to confusion between related languages.

show abstract

Section: Error Analysismentioning

confidence: 99%