2014
DOI: 10.3758/s13428-014-0459-x
|View full text |Cite
|
Sign up to set email alerts
|

Constructing and validating readability models: the method of integrating multilevel linguistic features with machine learning

Abstract: Multilevel linguistic features have been proposed for discourse analysis, but there have been few applications of multilevel linguistic features to readability models and also few validations of such models. Most traditional readability formulae are based on generalized linear models (GLMs; e.g., discriminant analysis and multiple regression), but these models have to comply with certain statistical assumptions about data properties and include all of the data in formulae construction without pruning the outli… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
20
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
6
3

Relationship

2
7

Authors

Journals

citations
Cited by 37 publications
(22 citation statements)
references
References 42 publications
2
20
0
Order By: Relevance
“…We also noticed that among the 30 selected features only the top 25 were included to create a readability model that is valid for categorizing CFL texts to the accuracy rate of 75% (exact accuracy) and 99% (adjacent accuracy), compared to a baseline of 44% and 87%, respectively. The superior performance of our model over the baseline lends support to those researchers who claimed that a small number of features may not represent the complex cognitive process of reading comprehension (Bailin & Grafstein, ; Bruce et al, ; Graesser et al., ; Sung et al., ) because we included a higher number of features as well as more complex features than the baseline.…”
Section: Discussionsupporting
confidence: 73%
“…We also noticed that among the 30 selected features only the top 25 were included to create a readability model that is valid for categorizing CFL texts to the accuracy rate of 75% (exact accuracy) and 99% (adjacent accuracy), compared to a baseline of 44% and 87%, respectively. The superior performance of our model over the baseline lends support to those researchers who claimed that a small number of features may not represent the complex cognitive process of reading comprehension (Bailin & Grafstein, ; Bruce et al, ; Graesser et al., ; Sung et al., ) because we included a higher number of features as well as more complex features than the baseline.…”
Section: Discussionsupporting
confidence: 73%
“…Using the SVM model along with 24 linguistic features, the readability prediction of language arts textbooks used in grades 1-6 reached an accuracy of 72.92 %. Sung, Chen, et al (2015) confirmed that the prediction accuracy of the SVM model along with 32 linguistic features outperformed the prediction accuracy of discriminate analysis models in classifying language arts textbooks used in grades 1-6. In a preliminary study, Sung, Lin, and Tseng (2014) further found that CRIE 2.0 achieved similar results for language arts texts used in grades 1-9.…”
Section: Crie 20supporting
confidence: 54%
“…Previous research Sung, Chen, et al, 2015) validated the effectiveness of CRIE 2.0. Using the SVM model along with 24 linguistic features, the readability prediction of language arts textbooks used in grades 1-6 reached an accuracy of 72.92 %.…”
Section: Crie 20mentioning
confidence: 91%
“…As Sebastiani (2001) argued, the algorithm generated by such a machine-learning approach often achieves accuracy comparable to that of human experts. Automatic text classification by machine learning is used in various fields of text analysis, such as scoring student essays (Hastings, Hughes, Magliano, Goldman, & Lawless, 2012), testing readability of texts (Sung et al, 2015), and investigating grammatical development of child language (Hassanali, Liu, Iglesias, Solorio, & Dollaghan, 2014). The attempts to develop a classifier of both suicidal notes and depressed tweets have been successful as well, achieving approximately 80% prediction accuracy (De Choudhury et al 2013;Pestian et al 2010).…”
mentioning
confidence: 99%