2016
DOI: 10.1111/cogs.12392
|View full text |Cite
|
Sign up to set email alerts
|

Social Media and Language Processing: How Facebook and Twitter Provide the Best Frequency Estimates for Studying Word Recognition

Abstract: Corpus‐based word frequencies are one of the most important predictors in language processing tasks. Frequencies based on conversational corpora (such as movie subtitles) are shown to better capture the variance in lexical decision tasks compared to traditional corpora. In this study, we show that frequencies computed from social media are currently the best frequency‐based estimators of lexical decision reaction times (up to 3.6% increase in explained variance). The results are robust (observed for Twitter‐ a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
53
0
2

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 51 publications
(57 citation statements)
references
References 32 publications
2
53
0
2
Order By: Relevance
“…For instance, a series of recent studies demonstrated that frequency norms derived from subtitles of films and TV programs tended to outperform those from printed texts in accounting for the variance of lexical processing time (and sometimes also accuracy) among native speakers of different languages (Brysbaert, Keuleers, & New, 2011;Brysbaert, Buchmeier, et al, 2011;Brysbaert & New, 2009;Cai & Brysbaert, 2010;Cuetos, Glez-Nosti, Barbón, & Brysbaert, 2011;Dimitropoulou & Carreiras, 2010;Duchon, Perea, Sebastián-Gallés, Martí, & Carreiras, 2013;Keuleers, Brysbaert, & New, 2010;Mandera, Keuleers, Wodniecka, & Brysbaert, 2015;New, Brysbaert, Veronis, & Pallier, 2007;Soares et al, 2015; van Heuven, Mandera, Keuleers, & Brysbaert, 2014; but see an exception in Pham, 2014, for Vietnamese). Internet-based frequency norms (e.g., based on Web newsgroup discussion), particularly those derived from recent social media sources (e.g., materials from blogs, Facebook, or Twitter), were also found to show comparable or better performance in predicting lexical processing, compared with other frequency norms (Balota et al, 2004;Burgess & Livesay, 1998;Herdağdelen & Marelli, 2017). The superiority of subtitle-based or Internet-based frequency norms could be attributed to the increasing dominance of TV and the Internet in people's daily lives, which makes subtitles and Internet materials more representative of language use (e.g., Brysbaert, Buchmeier, et al, 2011;; but see the different view in Baayen, Milin, &Ramscar, 2016, andHeister &Kliegl, 2012).…”
Section: Predictive Validity Of Corpus-based Frequency Norms In L1mentioning
confidence: 96%
See 3 more Smart Citations
“…For instance, a series of recent studies demonstrated that frequency norms derived from subtitles of films and TV programs tended to outperform those from printed texts in accounting for the variance of lexical processing time (and sometimes also accuracy) among native speakers of different languages (Brysbaert, Keuleers, & New, 2011;Brysbaert, Buchmeier, et al, 2011;Brysbaert & New, 2009;Cai & Brysbaert, 2010;Cuetos, Glez-Nosti, Barbón, & Brysbaert, 2011;Dimitropoulou & Carreiras, 2010;Duchon, Perea, Sebastián-Gallés, Martí, & Carreiras, 2013;Keuleers, Brysbaert, & New, 2010;Mandera, Keuleers, Wodniecka, & Brysbaert, 2015;New, Brysbaert, Veronis, & Pallier, 2007;Soares et al, 2015; van Heuven, Mandera, Keuleers, & Brysbaert, 2014; but see an exception in Pham, 2014, for Vietnamese). Internet-based frequency norms (e.g., based on Web newsgroup discussion), particularly those derived from recent social media sources (e.g., materials from blogs, Facebook, or Twitter), were also found to show comparable or better performance in predicting lexical processing, compared with other frequency norms (Balota et al, 2004;Burgess & Livesay, 1998;Herdağdelen & Marelli, 2017). The superiority of subtitle-based or Internet-based frequency norms could be attributed to the increasing dominance of TV and the Internet in people's daily lives, which makes subtitles and Internet materials more representative of language use (e.g., Brysbaert, Buchmeier, et al, 2011;; but see the different view in Baayen, Milin, &Ramscar, 2016, andHeister &Kliegl, 2012).…”
Section: Predictive Validity Of Corpus-based Frequency Norms In L1mentioning
confidence: 96%
“…In addition to genre, the dialectal variety also matters. For example, it was discovered that frequency norms from UK sources were better correlated with RTs from UK participants, whereas frequency norms from US sources were better correlated with RTs from US participants (Herdağdelen & Marelli, 2017; van Heuven et al, 2014).…”
Section: Predictive Validity Of Corpus-based Frequency Norms In L1mentioning
confidence: 99%
See 2 more Smart Citations
“…Of note, corpora that closely match day-to-day linguistic exposure typically produce the highest quality estimates of lexical variables (e.g., word frequency), as measured by the ability of those variables to predict lexical access times (e.g., Keuleers, Brysbaert, & New, 2010;Herdağdelen & Marelli, 2016). This observation is convergent with the earlier claim that the context distribution for memory retrieval during context-sparse psychological experiments is likely supplied by the active thoughts and knowledge that participants bring with them into the laboratory; in a moment, an argument will be provided that word frequency and contextual diversity are fundamentally related to each other as operationalizations of average need.…”
Section: Testing Predictions Of Needs Probability With Lexical Accessmentioning
confidence: 99%