Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR) 2014
DOI: 10.3115/v1/w14-1213
|View full text |Cite
|
Sign up to set email alerts
|

Classifying easy-to-read texts without parsing

Abstract: Document classification using automated linguistic analysis and machine learning (ML) has been shown to be a viable road forward for readability assessment. The best models can be trained to decide if a text is easy to read or not with very high accuracy, e.g. a model using 117 parameters from shallow, lexical, morphological and syntactic analyses achieves 98,9% accuracy.In this paper we compare models created by parameter optimization over subsets of that total model to find out to which extent different high… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
1

Year Published

2014
2014
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(15 citation statements)
references
References 7 publications
0
14
1
Order By: Relevance
“…This can follow from the fact that these features can be considered as proxies of the syntactic structure which in these experiments was represented through specific features: in this situation, the grafting process preferred syntactic features over morpho-syntactic ones, in spite of the lower accuracy of the dependency parser with respect to the part-of-speech tagger. Interestingly, this result is in contrast with what reported by Falkenjack and Jönsson (2014) for what concerns document readability assessment, who claim that an optimal subset of text features for readability based document classification does not need features induced via parsing. Among the morphosyntactic features, it appears that verbal features play an important role: this can follow both by the language dealt with which is a a morphologically rich language, and by the fact that these features do not have a counterpart at the syntactic level.…”
Section: Sentence Vs Document Classificationcontrasting
confidence: 55%
“…This can follow from the fact that these features can be considered as proxies of the syntactic structure which in these experiments was represented through specific features: in this situation, the grafting process preferred syntactic features over morpho-syntactic ones, in spite of the lower accuracy of the dependency parser with respect to the part-of-speech tagger. Interestingly, this result is in contrast with what reported by Falkenjack and Jönsson (2014) for what concerns document readability assessment, who claim that an optimal subset of text features for readability based document classification does not need features induced via parsing. Among the morphosyntactic features, it appears that verbal features play an important role: this can follow both by the language dealt with which is a a morphologically rich language, and by the fact that these features do not have a counterpart at the syntactic level.…”
Section: Sentence Vs Document Classificationcontrasting
confidence: 55%
“…They show that a small number of features can attain a high accuracy. (Falkenjack and Jönsson, 2014) However their work is carried out at document level and is not consistent with the results of Dell'Orletta et al at sentence level.…”
Section: Related Workmentioning
confidence: 67%
“…Contrasting these results quickly to previous research on the classification performance of linguistic features in the context of Support Vector Machines (Falkenjack and Jönsson 2014; Falkenjack, Mühlenbock, and Jönsson 2013) we can see that our results are quite different. Falkenjack and Jönsson (2014) found that the ratio of relative/interrogative pronouns performed barely better than chance on the task of classifying mixed-genre easy-to-read texts. The ratio of SweVoc words and the ratio of rightward dependencies were clearly better than chance but were not among the strongest predictors.…”
Section: Posterior For βmentioning
confidence: 94%