2017
DOI: 10.1163/22105832-00701001
|View full text |Cite
|
Sign up to set email alerts
|

The application of growth curve modeling for the analysis of diachronic corpora

Abstract: This paper introduces growth curve modeling for the analysis of language change in corpus linguistics. In addition to describing growth curve modeling, which is a regression-based method for studying the dynamics of a set of variables measured over time, we demonstrate the technique through an analysis of the relative frequencies of words that are increasing or decreasing over time in a multi-billion word diachronic corpus of Twitter. This analysis finds that increasing words tend to follow a trajectory simila… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 9 publications
(12 citation statements)
references
References 33 publications
0
12
0
Order By: Relevance
“…The regional dialect corpus used for this study consists of a large collection of geolocated Twitter data from the UK that we downloaded between 2014-01-01 and 2014-12-31 using the Twitter API. This data was collected as part of a larger project that has explored lexical variation on Twitter (see also Huang et al, 2016 ; Grieve et al, 2017 , 2018 ; Nini et al, 2017 ). In total, this corpus contains 1.8 billion words, consisting of 180 million Tweets, posted by 1.9 million unique accounts.…”
Section: Methodsmentioning
confidence: 99%
“…The regional dialect corpus used for this study consists of a large collection of geolocated Twitter data from the UK that we downloaded between 2014-01-01 and 2014-12-31 using the Twitter API. This data was collected as part of a larger project that has explored lexical variation on Twitter (see also Huang et al, 2016 ; Grieve et al, 2017 , 2018 ; Nini et al, 2017 ). In total, this corpus contains 1.8 billion words, consisting of 180 million Tweets, posted by 1.9 million unique accounts.…”
Section: Methodsmentioning
confidence: 99%
“…The corpus analyzed in this study represents American Twitter and consists of 8.9 billion words of geo-coded American mobile Twitter data, totaling 980 million Tweets written by 7 million users from across the contiguous United States, posted and downloaded between October 11th, 2013, and November 22nd, 2014, using the Twitter API (http://dev.twitter.com) (see Huang et al 2016; Grieve et al 2017; Nini et al 2017). We focused on Twitter because this variety of language provides a uniquely large and accessible source of geo-coded and time-stamped natural language data.…”
Section: A Corpus Of American Tweetsmentioning
confidence: 99%
“…Language use on social media is informal and creative, which makes it a hotbed for lexical innovation. Recent work using Twitter data has focused, for example, on the identification of neologisms ( Grieve et al, 2018 ), on their geographical diffusion ( Eisenstein et al, 2014 ), and on trajectories of diffusion ( Nini et al, 2017 ). Empirical investigations on the basis of Reddit data include studies of the linguistic dissemination of neologisms ( Stewart and Jacob.…”
Section: Modelling and Measuring The Diffusion Of Lexical Innovationsmentioning
confidence: 99%
“…For linguists, social media provides large amounts of data of authentic language use which opens up new opportunities for the empirical study of language variation and change. The size of these datasets as well as their informal nature allow for large-scale studies on the use and spread of new words, for example, to gain insights about general trajectories of diffusion ( Nini et al, 2017 ) or about factors that influence whether new words spread successfully ( Grieve, 2018 ). Moreover, metadata about speakers facilitate the study of aspects of diffusion that go beyond what can be captured by usage frequency alone.…”
Section: Introductionmentioning
confidence: 99%