This paper proposes a method for detecting errors in article usage and singular plural usage based on the mass count distinction. First, it learns decision lists from training data generated automatically to distinguish mass and count nouns. Then, in order to improve its performance, it is augmented by feedback that is obtained from the writing of learners. Finally, it detects errors by applying rules to the mass count distinction. Experiments show that it achieves a recall of 0.71 and a precision of 0.72 and outperforms other methods used for comparison when augmented by feedback.
Abstract. This paper proposes a method for detecting errors concerning article usage and singular/plural usage based on the mass count distinction. Although the mass count distinction is particularly important in detecting these errors, it has been pointed out that it is hard to make heuristic rules for distinguishing mass and count nouns. To solve the problem, first, instances of mass and count nouns are automatically collected from a corpus exploiting surface information in the proposed method. Then, words surrounding the mass (count) instances are weighted based on their frequencies. Finally, the weighted words are used for distinguishing mass and count nouns. After distinguishing mass and count nouns, the above errors can be detected by some heuristic rules. Experiments show that the proposed method distinguishes mass and count nouns in the writing of Japanese learners of English with an accuracy of 93% and that 65% of article errors are detected with a precision of 70%.
SUMMARYIt has been recognized that existing methods for rating English texts by reading level are mostly aimed at native speakers of English and therefore are not completely appropriate for Japanese learners of the language. Here we propose a method for rating English texts by reading level specifically targeted at Japanese learners of the language. To rate the reading level of a text for a Japanese learner of English, our method takes two types of information regarding a given text into account, namely, vocabulary and grammatical structure. Specifically, we rate the reading level of a text by using a vocabulary list and parser to extract particularly difficult vocabulary items or grammatical structures as features. To rate a text's reading level, two types of model are used: multiple regression and neural networks. Our experiments show that the proposed methods rate the reading level of a text with the following levels of accuracy: an average of 75% accuracy for multiple regression and 81.3% when using neural networks. These constitute improvements on existing methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.