The words we use in everyday language reveal our thoughts, feelings, personality, and motivations. Linguistic Inquiry and Word Count (LIWC) is a software program to analyse text by counting words in 66 psychologically meaningful categories that are catalogued in a dictionary of words. This article presents the Dutch translation of the dictionary that is part of the LIWC 2007 version. It describes and explains the LIWC instrument and it compares the Dutch and English dictionaries on a corpus of parallel texts. The Dutch and English dictionaries were shown to give similar results in both languages, except for a small number of word categories. Correlations between word counts in the two languages were high to very high, while effect sizes of the differences between word counts were low to medium. The LIWC 2007 categories can now be used to analyse Dutch language texts.
According to musicological studies on oral transmission, repeated patterns are considered important for determining musical similarity in folk songs. In this paper, we study the relevance of repeated patterns for modelling similarity and compression in a retrieval setting. Using a dataset of 360 Dutch folk songs, we compare the classification accuracy of both humanly annotated patterns and automatically retrieved patterns by means of a pattern discovery algorithm. A framework is proposed to use these patterns for compression and classification in tune families. The annotated patterns allow us to compress the songs by 60% at the expense of a 3 percentage points decrease in classification accuracy. However, none of the automatic pattern discovery algorithms is able to reach a similar combination of compression ratio and retrieval accuracy. We conclude that repeated patterns are relevant for similarity estimation and compression, but that the state of the art in automatic pattern discovery cannot compete with expert annotations in this retrieval setting.
Prominent among the social developments that the web 2.0 has facilitated is digital social reading (DSR): on many platforms there are functionalities for creating book reviews, 'inline' commenting on book texts, online story writing (often in the form of fanfiction), informal book discussions, book vlogs, and more. In this article we argue that DSR offers unique possibilities for research into literature, reading, the impact of reading and literary communication. We also claim that in this context computational tools are especially relevant, making DSR a field particularly suitable for the application of Digital Humanities methods. We draw up an initial categorization of research aspects of DSR and briefly examine literature for each category. We distinguish between studies on DSR that use it as a lens to study wider processes of literary exchange as opposed to studies for which the DSR culture is a phenomenon interesting in its own right. Via seven examples of DSR research we discuss the chosen approaches and their connection to research questions in literary studies.
Linguistic Inquiry and Word Count (LIWC) is a text analysis program developed by James Pennebaker and colleagues. At the basis of LIWC is a dictionary that assigns words to categories. This dictionary is specific to English. Researchers who want to use LIWC on non-English texts have typically relied on translations of the dictionary into the language of the texts. Dictionary translation, however, is a labour-intensive procedure. In this paper, we investigate an alternative approach: to use Machine Translation (MT) to translate the texts that must be analysed into English, and then use the English dictionary to analyse the texts. We test several LIWC versions, languages and MT engines, and consistently find the machine-translated text approach performs better than the translated-dictionary approach. We argue that for languages for which effective MT technology is available, there is no need to create new LIWC dictionary translations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.