Keywords: language change, style analysis, regressionThis study focuses on modelling general and individual language change over several decades. A timeline prediction task was used to identify interesting temporal features. Our previous work achieved high accuracy in predicting publication year, using lexical features marked for syntactic context. In this study, we use four feature types (character, word stem, part-of-speech, and word n-grams) to predict publication year, and then use associated models to determine constant and changing features in individual and general language use. We do this for two corpora, one containing texts by two different authors, published over a fifty-year period, and a reference corpus containing a variety of text types, representing general language style over time, for the same temporal span as the two authors. Our linear regression models achieve good accuracy with the two-author data set, and very good results with the reference corpus, bringing to light interesting features of language change.
We examine stylochronometry, the question of measuring change in linguistic style over time within an authorial canon and in relation to change in language in general use over a contemporaneous period. We take the works of two prolific authors from the 19 th /20 th century, Henry James and Mark Twain, and identify variables that change for them over time. We present a method of analysis applying regression on linguistic variables in predicting a temporal variable. In order to identify individual authors' effects on the model, we compare the model based on the novelists' works to a model based on a 19 th /20 th century American English reference set. We evaluate using R 2 and Root mean square error (RMSE), that indicates the average error on predicting the year. On the two-author data, we achieve an RMSE of ±7.2 years on unseen data (baseline: ±13.2); for the larger reference set, our model obtains an RMSE of ±4 on unseen data (baseline: ±17).
This work offers an investigation into linguistic changes in a corpus of literary authors hypothesised to be possibly attributable to the effects of ageing. In part, the analysis replicates an earlier study into these effects, but adds to it by explicitly analysing and modelling competing factors, specifically the influence of background language change. Our results suggest that it is likely that this underlying change in language usage is the primary force for the change observed in the linguistic variables that was previously attributed to linguistic ageing.
We build on past research in distinguishing English translations from originally English text, and in guessing the source language where the text is deemed to be a translation. We replicate an extant method in relation to both a reconstruction of the original data set and a fresh data set compiled on an analogous basis. We extend this with an analysis of the features that emerge from the combined data set. Finally, we report on an inverse use of the method, not as guessing the source language of a translated text, but as a tool in quality estimation, marking a text as requiring inspection if it is guessed to be a translation, rather than a text composed originally in the language analysed. With obtain c. 80% accuracy, comparable to results of earlier work in literary source language guessing-this supports the claim of the method's validity in identifying salient features of source language interference.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.