2015
DOI: 10.1080/17470218.2014.964271
|View full text |Cite
|
Sign up to set email alerts
|

On the Advantages of Word Frequency and Contextual Diversity Measures Extracted from Subtitles: The Case of Portuguese

Abstract: We examined the potential advantage of the lexical databases using subtitles and present SUBTLEX-PT, a new lexical database for 132,710 Portuguese words obtained from a 78 million corpus based on film and television series subtitles, offering word frequency and contextual diversity measures. Additionally we validated SUBTLEX-PT with a lexical decision study involving 1920 Portuguese words (and 1920 nonwords) with different lengths in letters (M = 6.89, SD = 2.10) and syllables (M = 2.99, SD = 0.94). Multiple r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

2
58
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
7
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 49 publications
(60 citation statements)
references
References 56 publications
2
58
0
Order By: Relevance
“…This database provides not only the token account of each word (i.e., word frequency), but also the proportion of films in which a word appears. As in previous research, the CD variable was operationalized as the proportion of films [documents] in which a word appears (see also Soares et al, 2015).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…This database provides not only the token account of each word (i.e., word frequency), but also the proportion of films in which a word appears. As in previous research, the CD variable was operationalized as the proportion of films [documents] in which a word appears (see also Soares et al, 2015).…”
Section: Methodsmentioning
confidence: 99%
“…In the past years, the effect of CD has received increasing attention in the field of word recognition. The basic finding is that the higher the number of contexts in which a word appears, the faster the word identification times (see also Cai & Brysbaert, 2010;Dimitropoulou, Duñabeitia, Avilés, Corral, & Carreiras, 2010;Perea, Soares, & Comesaña, 2013;Soares et al, 2015, for converging evidence). This effect is not restricted to single word identification tasks.…”
mentioning
confidence: 99%
“…Furthermore, we also made sure that the effect of plausibility was not confounded by other sources of information such as the lexical frequency of target words, which is known to mediate stimulus processing (see Rayner and Duffy (1986) for an example in reading research). In particular, we used the SUBTLEX-PT (Soares et al, 2015), which is the largest lexical database for the Portuguese language to date, computed the lexical frequency of plausible (2065 ± 3263) and implausible (1895 ± 4011) words and found no difference between conditions (t = .71, p = .5).…”
Section: Methodsmentioning
confidence: 99%
“…Indeed, contrary to the words' objective proprieties, mostly obtained from automatic (computational) procedures applied to large corpora (see, e.g., Soares, Machado, et al, 2015;Soares, Medeiros, et al, 2014, for recent examples of these procedures), collecting subjective proprieties is more demanding and time-consuming. Typically this implies conducting large-scale studies, and thus asking a great number of participants to rate a set of words in a given subjective dimension.…”
mentioning
confidence: 99%