Automated Text Scoring (ATS) provides a cost-effective and consistent alternative to human marking. However, in order to achieve good performance, the predictive features of the system need to be manually engineered by human experts. We introduce a model that forms word representations by learning the extent to which specific words contribute to the text's score. Using Long-Short Term Memory networks to represent the meaning of texts, we demonstrate that a fully automated framework is able to achieve excellent results over similar approaches. In an attempt to make our results more interpretable, and inspired by recent advances in visualizing neural networks, we introduce a novel method for identifying the regions of the text that the model has found more discriminative.
Abstract:The choice associated with words is a fundamental property of natural languages. It lies at the heart of quantitative linguistics, computational linguistics and language sciences more generally. Information theory gives us tools at hand to measure precisely the average amount of choice associated with words: the word entropy. Here, we use three parallel corpora, encompassing ca. 450 million words in 1916 texts and 1259 languages, to tackle some of the major conceptual and practical problems of word entropy estimation: dependence on text size, register, style and estimation method, as well as non-independence of words in co-text. We present two main findings: Firstly, word entropies display relatively narrow, unimodal distributions. There is no language in our sample with a unigram entropy of less than six bits/word. We argue that this is in line with information-theoretic models of communication. Languages are held in a narrow range by two fundamental pressures: word learnability and word expressivity, with a potential bias towards expressivity. Secondly, there is a strong linear relationship between unigram entropies and entropy rates. The entropy difference between words with and without co-textual information is narrowly distributed around ca. three bits/word. In other words, knowing the preceding text reduces the uncertainty of words by roughly the same amount across languages of the world.
Recent work on Grammatical Error Correction (GEC) has highlighted the importance of language modeling in that it is certainly possible to achieve good performance by comparing the probabilities of the proposed edits. At the same time, advancements in language modeling have managed to generate linguistic output, which is almost indistinguishable from that of human-generated text. In this paper, we up the ante by exploring the potential of more sophisticated language models in GEC and offer some key insights on their strengths and weaknesses. We show that, in line with recent results in other NLP tasks, Transformer architectures achieve consistently high performance and provide a competitive baseline for future machine learning models.
Word frequencies are central to linguistic studies investigating processing difficulty, learnability, age of acquisition, diachronic transmission and the relative weight given to a concept in society. However, there are few crosslinguistic studies on entire distributions of word frequencies, and even less on systematic changes within them. Here, we first define and test an exact measure for the relative difference between distributions-the Normalized Frequency Difference (NFD). * We then apply this measure to parallel corpora, explaining systematic variation in the frequency distributions within the same language and across different languages. We further establish the NFD between lemmatized and unlemmatized corpora as a frequencybased measure of inflectional productivity of a language. Finally, we argue that quantitative measures like the NFD can advance language typology beyond abstract, theorydriven expert judgments, towards more corpusbased, empirical and reproducible analyses.
Abstract:The choice associated with words is a fundamental property of natural languages. It lies at the heart of quantitative linguistics, computational linguistics, and language sciences more generally. Information-theory gives us tools at hand to measure precisely the average amount of choice associated with words -the word entropy. Here we use three parallel corpora -encompassing ca. 450 million words in 1916 texts and 1259 languages -to tackle some of the major conceptual and practical problems of word entropy estimation: dependence on text size, register, style and estimation method, as well as non-independence of words in co-text. We present three main results: 1) a text size of 50K tokens is sufficient for word entropies to stabilize throughout the text, 2) across languages of the world, word entropies display a unimodal distribution that is skewed to the right. This suggests that there is a trade-off between the learnability and expressivity of words across languages of the world. 3) There is a strong linear relationship between unigram entropies and entropy rates, suggesting that they are inherently linked. We discuss the implications of these results for studying the diversity and evolution of languages from an information-theoretic point of view.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.