Discourse particles are among the most commented-upon features of Colloquial Singapore English (CSE). Their use has been shown to vary depending on formality, context, gender and ethnicity, although results differ from one study to another. This study uses the Corpus of Singapore English Messages (CoSEM), a large-scale corpus of texts composed by Singaporeans and sent using electronic messaging services, to investigate gender and ethnic factors as predictors of particle use. The results suggest a strong gender effect as well as several particle-specific ethnic effects. More generally, our study underlines the special nature of the grammatical class of discourse particles in CSE, which is open to new additions as the sociolinguistic and pragmatic need for them develops.
This article introduces the first version of the Corpus of Singapore English Messages (CoSEM), a 3.6‐million‐word monitor corpus of online text messages collected between 2016 and 2019, compiled and managed by a group of scholars who share an interest in Colloquial Singapore English (CSE) research. The paper explains the motivations behind developing a new corpus for the investigation of CSE. It also documents the process of compiling and organizing CoSEM and describes the corpus's initial structure and composition. We further discuss the social variables used in tagging the data, as well as ethical challenges, advantages, and disadvantages unique to online message datasets. In addition, we present preliminary analyses of two selected CSE features: (1) the Hokkien‐derived expression (bo)jio and (2) sentence‐final adverbs (already, also, only). As CoSEM is an ongoing project, we conclude the article with notes on future directions.
This paper presents the Twitter Corpus of Philippine Englishes (TCOPE): a dataset of 27 million tweets amounting
to 135 million words collected from 29 cities across the Philippines. It provides an overview of the dataset, and then shows how
it can be employed to examine Philippine English (PhilE) and its relationship with extralinguistic factors (e.g. ethno-geographic
region, time, sex). The focus is on the patterns of variation involving four PhilE features: (1) irregular past tense morpheme
-t, (2) double comparatives, (3) subjunctive were, and (4) phrasal verb base
from. My analyses corroborate previous work and further demonstrate structured heterogeneity within PhilE, indicating
that it is a multifaceted and dynamic variety. TCOPE has shown itself to be useful for exploring both the “general” features of
contemporary PhilE and the different forms of variation within it. It contributes to a deeper understanding of Philippine
English(es) over time and in different social contexts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.