Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023) 2023
DOI: 10.18653/v1/2023.vardial-1.20
|View full text |Cite
|
Sign up to set email alerts
|

Dialect Representation Learning with Neural Dialect-to-Standard Normalization

Olli Kuparinen,
Yves Scherrer

Abstract: Language label tokens are often used in multilingual neural language modeling and sequence-to-sequence learning to enhance the performance of such models. An additional product of the technique is that the models learn representations of the language tokens, which in turn reflect the relationships between the languages. In this paper, we study the learned representations of dialects produced by neural dialect-to-standard normalization models. We use two large datasets of typologically different languages, name… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 21 publications
(28 reference statements)
0
1
0
Order By: Relevance
“…Note that in all our experiments we systematically use sentence-level contexts. However, our previous work has shown that contexts of sliding windows of three words can bring significant improvements, especially for the TF-based systems (Kuparinen et al, 2023). This approach requires word-level data alignments which are not trivial to produce.…”
Section: Official Resultsmentioning
confidence: 99%
“…Note that in all our experiments we systematically use sentence-level contexts. However, our previous work has shown that contexts of sliding windows of three words can bring significant improvements, especially for the TF-based systems (Kuparinen et al, 2023). This approach requires word-level data alignments which are not trivial to produce.…”
Section: Official Resultsmentioning
confidence: 99%