2010
DOI: 10.3389/fpsyg.2010.00218
|View full text |Cite
|
Sign up to set email alerts
|

Subtitle-Based Word Frequencies as the Best Estimate of Reading Behavior: The Case of Greek

Abstract: Previous evidence has shown that word frequencies calculated from corpora based on film and television subtitles can readily account for reading performance, since the language used in subtitles greatly approximates everyday language. The present study examines this issue in a society with increased exposure to subtitle reading. We compiled SUBTLEX-GR, a subtitled-based corpus consisting of more than 27 million Modern Greek words, and tested to what extent subtitle-based frequency estimates and those taken fro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

4
54
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 46 publications
(58 citation statements)
references
References 49 publications
(110 reference statements)
4
54
0
Order By: Relevance
“…Given that the correlations involving objective frequency measures are less high than those involving the subjective frequency measure, this suggests that the objective frequency measures computed for idiomatic expressions are a less valid index of the true frequencies of encounter of the expressions. In the word reading literature, it is assumed that the higher the correlation of a given word frequency measure is with RT data (e.g., lexical decision), the better this measure indexes the true frequency with which the participants have encountered the words (Dimitropoulou, Duñabeitia, Avilés, Corral, & Carreiras, 2010). We follow the same line of reasoning here.…”
Section: Scoring Of the Rt Datamentioning
confidence: 78%
“…Given that the correlations involving objective frequency measures are less high than those involving the subjective frequency measure, this suggests that the objective frequency measures computed for idiomatic expressions are a less valid index of the true frequencies of encounter of the expressions. In the word reading literature, it is assumed that the higher the correlation of a given word frequency measure is with RT data (e.g., lexical decision), the better this measure indexes the true frequency with which the participants have encountered the words (Dimitropoulou, Duñabeitia, Avilés, Corral, & Carreiras, 2010). We follow the same line of reasoning here.…”
Section: Scoring Of the Rt Datamentioning
confidence: 78%
“…Surface frequency values gathered from subtitles are the values that we opted for, because subtitle frequencies are available for the Spanish and English languages and are also thought to be more representative of the language in use than are printed frequencies (Cuetos, González-Nosti, Barbón, & Brysbaert, 2011;Dimitropoulou, Duñabeitia, Avilés, Corral, & Carreiras, 2010).…”
Section: Word Frequencymentioning
confidence: 99%
“…Recent work has found that the number of different contexts in which a word occurs can be more informative than the token frequency (Adelman, Brown, & Quesada, 2006;Brysbaert & New, 2009;Dimitropoulou et al, 2010;Keuleers et al, 2010;Perea, Soares, & Comesaña, 2013). The original EsPal subtitles database described above uses all the files available, so some shows are multiply represented.…”
Section: Subtitle Corpus Contextual Diversity Processingmentioning
confidence: 99%
“…A number of studies have shown that, across many languages, word frequencies derived from movie subtitle corpora provide a better account for various psycholinguistic effects (Brysbaert, New, & Keuleers, 2012;Cai & Brysbaert, 2010;Cuetos-Vega et al, 2011;Dimitropoulou, Duñabeitia, Avilés, Corral, & Carreiras, 2010;Keuleers, Brysbaert, & New, 2010;New, Brysbaert, Veronis, & Pallier, 2007). However, properties from written corpora have in the past been more common and may better predict some phenomena, so it is useful to have different sources of data available for researchers, depending on their goals.…”
mentioning
confidence: 99%