We describe an educational demonstration interface and tools for stylometry (authorship attribution and pro ling) and readability research for Dutch. The Stylene system consists of a popularisation interface for learning about stylometric analysis, and of web-based interfaces to so ware for readability and stylometry research aimed at researchers from the humanities and social sciences who do not want to develop or install such so ware themselves.
IntroductionThe last decade has seen a marked increase in research on computational stylometry, the subarea of natural language processing that concerns itself with the categorisation of texts according to the psychological and sociological properties of their authors. Also called text pro ling, this research tries to develop systems, mostly based on text analytics techniques, that combine natural language processing and machine learning methods. These systems are trained to determine whether the author of a text is male or female, their education level, region of origin, personality, and even mental health, whether they are a native speaker or not, and many other potentially useful attributes. Of course, authorship attribution research has existed for a long time, and is in a sense the limit case of computational stylometry: supposing that everyone has a unique combination of demographic, psychological and idiosyncratic style properties, this would be their idiolect or 'stylome' (Van Halteren et al, 2005;Coulthard, 2004), and it should be possible to assign texts of unknown authorship to speci c authors provided that models of their stylome exist.