It is well‐known that Outer Circle English has undergone extensive contact‐induced lexical and grammatical restructuring. Is it possible to use common NLP tools developed for Inner Circle English to process Outer Circle English texts? Here, we report our experience of using the Stanford PoS tagger to tag the Singaporean component of the International Corpus of English (ICE‐SIN). We isolate two major contact‐related causes of tagging errors: (1) lexical and grammatical loans directly borrowed from the local languages; and (2) English‐origin words with new grammatical meanings acquired from the local languages. While the first type may be easy to overcome, the latter type is intractable, creating an extra layer of morphosyntactic complexity. We achieved comparable accuracy rates in the more formal registers, and a lower but still decent 88% in the informal register of private conversations. A tagged ICE‐SIN allows us to investigate lexical and grammatical restructuring at unprecedented levels of detail.
China is ethnically and linguistically diverse. There are 56 officially recognized ethnic groups in the country, including the majority Han, with a 1.2 billion-strong population and Tatar, the smallest minority group with only 3,556 people residing in Xinjiang, according to the 2010 Population Census of the People’s Republic of China, the latest census data available on the government’s website (www.stats.gov.cn). The Han accounts for 91.6% of the population, with the minorities taking up the balance of 8.4%. Most ethnic groups have their own languages, which fall into typologically distinct language families, the largest being Altaic and Sino-Tibetan. Ethnologue lists 299 languages in China and rates the country 0.521 in linguistic diversity, compared with 0.035 for Japan and 0.010 for South Korea (Simons & Fennig 2017). A few ethnic groups, such as the Hui (Chinese Muslims) and the Manchus, who founded the last imperial dynasty of Qing (1644–1912), have lost their indigenous languages over the centuries. They speak the language of the Han majority. Linguistic diversity in China is manifested in two ways: across the ethnic groups and within the Han majority. In what follows, we give a schematic description of the languages and briefly summarize the papers in this issue that offer a snapshot of language contact in China.
It is well-documented that patients with semantic dementia and Alzheimer’s disease present with difficulty in lexical retrieval and reversal of the concreteness effect in nouns and verbs. Little is known about the lexical phenomena before the onset of symptoms. We anticipate that there are linguistic signs in the speech of people who suffer from mild cognitive impairment (MCI), the prodromal stage of dementia. Here, we report the results of a novel corpus-linguistic approach to the early detection of cognitive impairment. We recorded 40 hours of natural, unconstrained speech of 188 English-speaking Singaporeans; 90 are diagnosed with MCI (51 amnestic, 39 nonamnestic), and 98 are cognitively healthy. The recordings yield 327,470 words, which are tagged for parts of speech. We calculate the per-minute speech rates and concreteness scores of nouns and verbs, and of all tagged words, in our dataset. Our analysis shows that the two measures of nouns and verbs identify different subtypes of MCI. Compared with healthy controls, subjects with amnestic MCI produce fewer but more abstract nouns, whereas subjects with nonamnestic MCI produce fewer but more concrete verbs. Cognitive impairment is manifested in ordinary language before the presentation of clinical symptoms, and can be detected through non-invasive corpus-based analysis of natural speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.