Aggregating Dialectology, Typology, and Register Analysis 2014
DOI: 10.1515/9783110317558.174
|View full text |Cite
|
Sign up to set email alerts
|

A weakly supervised multivariate approach to the study of language variation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 45 publications
(10 citation statements)
references
References 0 publications
0
10
0
Order By: Relevance
“…LI for closely-related languages, language varieties, and dialects has been studied for Malay-Indonesian (Ranaivo-Malançon, 2006), Indian languages (Murthy and Kumar, 2006), South Slavic languages (Ljubešić et al, 2007;Tiedemann and Ljubešić, 2012;Kranjcić, 2014, 2015), Serbo-Croatian dialects (Zecevic and Vujicic-Stankovic, 2013), English varieties (Lui and Cook, 2013;Simaki et al, 2017), Dutch-Flemish (van der Lee and Bosch, 2017), Dutch dialects (including a temporal dimension) (Trieschnigg et al, 2012), German Dialects (Hollenstein and Aepli, 2015) Mainland-Singaporean-Taiwanese Chinese (Huang and Lee, 2008), Portuguese varieties (Zampieri and Gebre, 2012;, Spanish varieties Maier and Gómez-Rodríguez, 2014), French varieties (Mokhov, 2010a,b;Diwersy et al, 2014), languages of the Iberian Peninsula , Romanian dialects (Ciobanu and Dinu, 2016), and Arabic dialects Zaidan and Callison-Burch, 2014;Tillmann et al, 2014;Sadat et al, 2014b;Wray, 2018), the last of which we discuss in more detail in this section. As to off-the-shelf tools which can identify closely-related languages, Zampieri and Gebre (2014) released a LI system trained to identify 27 languages, including 10 language varieties.…”
Section: Similar Languages Language Varieties and Dialectsmentioning
confidence: 99%
See 1 more Smart Citation
“…LI for closely-related languages, language varieties, and dialects has been studied for Malay-Indonesian (Ranaivo-Malançon, 2006), Indian languages (Murthy and Kumar, 2006), South Slavic languages (Ljubešić et al, 2007;Tiedemann and Ljubešić, 2012;Kranjcić, 2014, 2015), Serbo-Croatian dialects (Zecevic and Vujicic-Stankovic, 2013), English varieties (Lui and Cook, 2013;Simaki et al, 2017), Dutch-Flemish (van der Lee and Bosch, 2017), Dutch dialects (including a temporal dimension) (Trieschnigg et al, 2012), German Dialects (Hollenstein and Aepli, 2015) Mainland-Singaporean-Taiwanese Chinese (Huang and Lee, 2008), Portuguese varieties (Zampieri and Gebre, 2012;, Spanish varieties Maier and Gómez-Rodríguez, 2014), French varieties (Mokhov, 2010a,b;Diwersy et al, 2014), languages of the Iberian Peninsula , Romanian dialects (Ciobanu and Dinu, 2016), and Arabic dialects Zaidan and Callison-Burch, 2014;Tillmann et al, 2014;Sadat et al, 2014b;Wray, 2018), the last of which we discuss in more detail in this section. As to off-the-shelf tools which can identify closely-related languages, Zampieri and Gebre (2014) released a LI system trained to identify 27 languages, including 10 language varieties.…”
Section: Similar Languages Language Varieties and Dialectsmentioning
confidence: 99%
“…by substituting named entities or content words by placeholders), or at a higher level of abstraction, using POS tags or other morphosyntactic information Lui et al, 2014b;Bestgen, 2017), or even adversarial machine learning to modify the learned representations to remove such artefacts (Li et al, 2018). Finally, an interesting research direction could be to combine work on closely-related languages with the analysis of regional or dialectal differences in language use (Peirsman et al, 2010;Anstein, 2013;Doyle, 2014;Diwersy et al, 2014).…”
Section: Similar Languages Language Varieties and Dialectsmentioning
confidence: 99%
“…There have been studies that went beyond lexical features in an attempt to capture some of the abstract systemic differences between similar languages using linguistically motivated features. This includes the use of semi-delexicalized text representations in which named entities or content words are replaced by placeholders, or fully de-lexicalized representations using POS tags and other morphosyntactic information (Zampieri, Gebre, and Diwersy 2013;Diwersy, Evert, and Neumann 2014;Bestgen 2017).…”
Section: Language and Dialect Identificationmentioning
confidence: 99%
“…Language identification was studied for closely related languages such as Malay-Indonesian (Ranaivo-Malançon 2006), South Slavic languages (Ljubešić, Mikelić, and Boras 2007;Tiedemann and Ljubešić 2012), and languages of the Iberian Peninsula (Zubiaga et al 2014). It was also applied to national varieties of English (Lui and Cook 2013;Simaki et al 2017), French (Mokhov 2010;Diwersy et al 2014), Chinese (Huang and Lee 2008), and Portuguese (Zampieri and Gebre 2012;Zampieri et al 2016), as well as to dialects of Romanian (Ciobanu and Dinu 2016), Arabic (Elfardy and Diab 2013; Zaidan and Callison-Burch 2014; Tillmann, Al-Onaizan, and Mansour 2014; Sadat, Kazemi, and Farzindar 2014; Wray 2018), and German (Hollenstein and Aepli 2015). The VarDial shared tasks included the languages in the DSLCC, as well as Chinese varieties, Dutch and Flemish, dialects of Arabic, Romanian, and German, and many others.…”
Section: Language and Dialect Identificationmentioning
confidence: 99%
“…One possible confounding factor is the topicality of the training data -if the data for each variety is drawn from different datasets, it is possible that a classifier will simply learn the topical differences between datasets. Diwersy et al (2014) carried out a study of colligations in French varieties, where the variation in the grammatical function of noun lemmas was studied across French-language newspapers from six countries. In their initial analysis the found that the characteristic features of each country included the name of the country and other country-specific proper nouns, which resulted in near 100% classification accuracy but do not provide any insight into national varieties from a linguistic perspective.…”
Section: De-lexicalized Text Representation For Dslmentioning
confidence: 99%