This paper presents the results of a pilot study on using automatic text categorization techniques in identifying online sexual predators. We report on our SVM and k-NN models. Our distance weighted k-NN classifier reaches an f-measure of 0.943 on test data distinguishing the child and the victim sides of text chats between sexual predators and volunteers posing as underage victims.
This paper reports on the first stage of building an educational tool for international graduate students to improve their academic writing skills. Taking a text-categorization approach, we experimented with several models to automatically classify sentences in research article introductions into one of three rhetorical moves. The paper begins by situating the project within the larger framework of intelligent computer-assisted language learning. It then presents the details of the study with very encouraging results. The paper then concludes by commenting on how the system may be improved and how the project is intended to be pursued and evaluated.
This paper reports on the development of the analysis engine for the Research Writing Tutor (RWT), an AWE program designed to provide genre and discipline-specific feedback on the functional units of research article discourse. Unlike traditional NLP-based applications that categorize complete documents, RWT's analyzer categorizes every sentence in the text as both a communicative move and a rhetorical step. We describe the construction of a cascade of two support vector machine classifiers trained on a multi-disciplinary corpus of annotated Introduction texts. This work not only demonstrates the usefulness of NLP for automated genre analysis, but also paves the road for future AWE endeavors and forms of automated feedback that could facilitate construction of functional meaning in writing.
This study piloted test items that will be used in a computer-delivered and scored test of productive grammatical ability in English as a second language (ESL). Findings from research on learners’ development of morphosyntactic, syntactic, and functional knowledge were synthesized to create a framework of grammatical features. We outline the interpretive argument and present results from four pilot test administrations in terms of (a) reliability, (b) relationships between item difficulties and developmental stages, (c) correlations with other English tests, and (d) predictability of test scores in relation to proficiency levels. The results support the potential of assessing productive ESL grammatical ability by targeting areas identified in SLA research, and the plausibility of moving forward with computer delivery and scoring.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.