We consider the problem of automatically inferring latent character types in a collection of 15,099 English novels published between 1700 and 1899. Unlike prior work in which character types are assumed responsible for probabilistically generating all text associated with a character, we introduce a model that employs multiple effects to account for the influence of extra-linguistic information (such as author). In an empirical evaluation, we find that this method leads to improved agreement with the preregistered judgments of a literary scholar, complementing the results of alternative models.
No abstract
We use quantitative methods to analyze a collection of 21,367 scholarly articles in literary studies from 1889–2013. Our approach reveals aspects of our disciplinary history that have been occluded by existing histories’ emphasis on generational and methodological conflict. We demonstrate gradual, unnoted shifts in the themes and vocabularies of scholarship—including the long rise of new subjects (like violence); we show the surprising novelty of central theoretical concepts; and we explore transformations in the shared rationales for literary scholarship that exceed the boundaries of conventional labels like “New Criticism” and “New Historicism.” Though our method uses computational tools, we not claim to provide a definitive or objective perspective on disciplinary history; instead, our approach, like the related methods of content analysis in the social sciences, allows us to pursue nuanced interpretations of the language of many texts at once.
This essay explores the changing significance of gender in fiction, asking especially whether its prominence in characterization has varied from the end of the eighteenth century to the beginning of the twenty-first. We have reached two conclusions, which may seem in tension with each other. The first is that gender divisions between characters have become less sharply marked over the last 170 years.In the middle of the nineteenth century, very different language is used to describe fictional men and women. But that difference weakens steadily as we move forward to the present; the actions and attributes of characters are less clearly sorted into gender categories. On the other hand, we haven't found the same progressive story in the history of authorship. In fact, there is an eye-opening, under-discussed decline in the proportion of fiction actually written by women, which drops by half (from roughly 50% of titles to roughly 25%) as we move from 1850 to 1950. The number of characters who are women or girls also drops. We are confronted with a paradoxical pattern. While gender roles were becoming more flexible, the space actually allotted to (real, and fictional) women on the shelves of libraries was contracting sharply. We explore the evidence for this paradox and suggest a few explanations. This essay considers both the gender positions ascribed to authors as biographical personages, and the signs of gender they used in producing characters. In both cases, we understand gender as a conventional role that people were expected to assume in order to become legible in a social context. Authors and characters have 1 The evidence used in this paper has depended heavily on the labor of other hands. The HathiTrust corpus we use was processed by Boris Capitanu. The Chicago Novel Corpus was collected by Hoyt Long and Richard Jean So, and enriched with metadata by Teddy Roland. Conversation with Heather Love, Laura Mandell, and Allen Riddell, and the whole NovelTM research group, turned up valuable leads.
Quantitative methods have been central to the humanities since scholars began relying on full-text search to map archives. But the intellectual implications of search technology are rendered opaque by humanists’ habit of considering algorithms as arbitrary tools. To reflect more philosophically, and creatively, on the hermeneutic options available to us, humanists may need to converse with disciplines that understand algorithms as principled epistemological theories. We need computer science, in other words, not as a source of tools but as a theoretical interlocutor.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.