This report presents work on the development of a new corpus of non‐native English writing. It will be useful for the task of native language identification, as well as grammatical error detection and correction, and automatic essay scoring. In this report, the corpus is described in detail.
In this article, the authors describe procedures used in the development of a new scale of militant extremist mindset. A 2-step approach consisted of (a) linguistic analysis of the texts produced by known terrorist organizations and selection of statements from these texts that reflect the mindset of those belonging to these organizations and (b) analyses of the structural properties of the scales based on 132 selected statements. Factor analysis of militant extremist statements with participants (N = 452) from Australia, Serbia, and the United States produced 3 dimensions: (a) justification and advocacy of violence (War factor), (b) violence in the name of God (God factor), and (c) blaming Western nations for the problems in the world today (West factor). We also report the distributions of scores for the 3 subscales, mean differences among the 3 national samples, and correlations with a measure of dogmatism (M. Rokeach, 1956).
Educational assessment applications, as well as other natural-language interfaces, need some mechanism for validating user responses. If the input provided to the system is infelicitous or uncooperative, the proper response may be to simply reject it, to route it to a bin for special processing, or to ask the user to modify the input. If problematic user input is instead handled as if it were the system's normal input, this may degrade users' confidence in the software, or suggest ways in which they might try to "game" the system. Our specific task in this domain is the identification of student essays which are "off-topic", or not written to the test question topic. Identification of off-topic essays is of great importance for the commercial essay evaluation system Criterion SM . The previous methods used for this task required 200-300 human scored essays for training purposes. However, there are situations in which no essays are available for training, such as when users (teachers) wish to spontaneously write a new topic for their students. For these kinds of cases, we need a system that works reliably without training data. This paper describes an algorithm that detects when a student's essay is off-topic without requiring a set of topic-specific essays for training. This new system is comparable in performance to previous models which require topic-specific essays for training, and provides more detailed information about the way in which an essay diverges from the requested essay topic.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.