Stylometric and text categorization results show that author gender can be discerned in texts with relatively high accuracy. However, it is difficult to explain what gives rise to these results and there are many possible confounding factors, such as the domain, genre, and target audience of a text. More fundamentally, such classification efforts risk invoking stereotyping and essentialism. We explore this issue in two datasets of Dutch literary novels, using commonly used descriptive (LIWC, topic modeling) and predictive (machine learning) methods. Our results show the importance of controlling for variables in the corpus and we argue for taking care not to overgeneralize from the results.
This paper explores building blocks in extant and emerging social media toward the possibilities they offer to the scholarly edition in electronic form, positing that we are witnessing the nascent stages of a new social edition existing at the intersection of social media and the digital editing. Beginning with a typological formulation of electronic scholarly editions, activities common to humanities scholars who engage texts as expert readers are considered, noting that many methods of engagement both reflect the interrelated nature of long-standing professional reading strategies and are social in nature; extending this framework, the next steps in the scholarly edition's development in its incorporation of social media functionality reflect the importance of traditional humanistic activities and workflows, and include collaboration, incorporating contributions by its readers and re-visioning the role of the editor away from ultimate authority and more toward facilitator of reader involvement. Intended to provide a 'toolkit' for academic consideration, this discussion of the emerging social edition points to new methods of textual engagement in digital literary studies and is accompanied by two integral, detailed appendices: one addressing issues pertinent to online reading and interaction, and another on social networking tools.
Overview
We study perceptions of literariness in a set of contemporary Dutch novels. Experiments with machine learning models show that it is possible to automatically distinguish novels that are seen as highly literary from those that are seen as less literary, using surprisingly simple textual features. The most discriminating features of our classification model indicate that genre might be a confounding factor, but a regression model shows that we can also explain variation between highly literary novels from less literary ones within genre.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.