Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial) 2017
DOI: 10.18653/v1/w17-1225
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Identify Arabic and German Dialects using Multiple Kernels

Abstract: We present a machine learning approach for the Arabic Dialect Identification (ADI) and the German Dialect Identification (GDI) Closed Shared Tasks of the DSL 2017 Challenge. The proposed approach combines several kernels using multiple kernel learning. While most of our kernels are based on character p-grams (also known as n-grams) extracted from speech transcripts, we also use a kernel based on i-vectors, a low-dimensional representation of audio recordings, provided only for the Arabic data. In the learning … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 24 publications
(20 citation statements)
references
References 23 publications
0
20
0
Order By: Relevance
“…Remarkably, they obtain state-of-the-art performance without using knowledge from the target domain, which indicates that string kernels provide robust results in the cross-domain setting without any domain adaptation. Ionescu et al [18] obtained the best performance in the Arabic Dialect Identification Shared Task of the 2017 VarDial Evaluation Campaign [41], with an improvement of 4.6% over the second-best method. It is important to note that the training and the test speech samples prepared for the shared task were recorded in different setups [41], or in other words, the training and the test sets are drawn from different distributions.…”
Section: String Kernelsmentioning
confidence: 99%
See 1 more Smart Citation
“…Remarkably, they obtain state-of-the-art performance without using knowledge from the target domain, which indicates that string kernels provide robust results in the cross-domain setting without any domain adaptation. Ionescu et al [18] obtained the best performance in the Arabic Dialect Identification Shared Task of the 2017 VarDial Evaluation Campaign [41], with an improvement of 4.6% over the second-best method. It is important to note that the training and the test speech samples prepared for the shared task were recorded in different setups [41], or in other words, the training and the test sets are drawn from different distributions.…”
Section: String Kernelsmentioning
confidence: 99%
“…However, researchers proposed several domain adaptation techniques by using the unlabeled test data to obtain better performance [5,14,16,25,37]. Interestingly, some recent works [13,18] indicate that string kernels can yield robust results in the cross-domain setting without any domain adaptation. In fact, methods based on string kernels have demonstrated impressive results in various text classification tasks ranging from native language identification [22][23][24]36] and authorship identification [34] to dialect identification [4,18,21], sentiment analysis [13,35] and automatic essay scoring [7].…”
Section: Introductionmentioning
confidence: 99%
“…Team qcri mit used an ensemble of two SVMs and a stochastic gradient classifier (SGD). Team unibuckernel experimented with different kernels using kernel ridge regression (KRR) and kernel discriminant analysis (KDA) (Ionescu & Butnaru, 2017). They obtained their best results using KRR based on the sum of three kernels.…”
Section: German Dialect Identificationmentioning
confidence: 99%
“…Our algorithm is inspired by LRD [9] which has successfully been applied to phylogenetic analysis [9], sequence alignment [10], native language identification [11], [12], [13] and Arabic dialect identification [14], [32]. We next present how we adapt LRD and obtain a novel algorithm, termed Local Frame Match Distance, for the task of gesture recognition.…”
Section: Local Frame Match Distancementioning
confidence: 99%