2014
DOI: 10.1371/journal.pone.0111445
|View full text |Cite
|
Sign up to set email alerts
|

An Information Theoretic Clustering Approach for Unveiling Authorship Affinities in Shakespearean Era Plays and Poems

Abstract: In this paper we analyse the word frequency profiles of a set of works from the Shakespearean era to uncover patterns of relationship between them, highlighting the connections within authorial canons. We used a text corpus comprising 256 plays and poems from the 16th and 17th centuries, with 17 works of uncertain authorship. Our clustering approach is based on the Jensen-Shannon divergence and a graph partitioning algorithm, and our results show that authors' characteristic styles are very powerful factors in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
6
3

Relationship

2
7

Authors

Journals

citations
Cited by 15 publications
(6 citation statements)
references
References 30 publications
0
5
0
Order By: Relevance
“…The two binary classes in this data set are plays (202) and poems (54). The “frequency of use” of 220 functional words has been extracted from a cohort of 66 907 words previously analyzed by Arefin et al The observed frequencies in the different works of these 220 functional words are used as features. The goal is to identify “a subset of functional words” that can group the works into the two classes: plays and poems.…”
Section: Computational Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The two binary classes in this data set are plays (202) and poems (54). The “frequency of use” of 220 functional words has been extracted from a cohort of 66 907 words previously analyzed by Arefin et al The observed frequencies in the different works of these 220 functional words are used as features. The goal is to identify “a subset of functional words” that can group the works into the two classes: plays and poems.…”
Section: Computational Resultsmentioning
confidence: 99%
“…Univariate filter methods start by individually ranking each feature by using a statistical test or a predefined criterion and then using some ad hoc method to select the best‐ranked features . Examples of univariate filter–based feature selection methods can be found in the works of Arefin et al and Vergara and Estévez . Understandably, such methods do not consider the pairwise correlation between feature values across the set of samples.…”
Section: Introductionmentioning
confidence: 99%
“…Vickers ( 2011 ) uses a tri-gram, or n-gram, approach, while (Hirsch and Craig, 2014 ) use function word frequency. They also use methods based on the Information Theoretic measure Jensen-Shannon divergence (JSD), and unsupervised graph partitioning clustering algorithms (Arefin et al, 2014 ). There are other techniques used in this period of Shakespearean analysis, including simple function words (Matthews and Merriam, 1993 ; Merriam and Matthews, 1994 ) and word adjacency networks (WANs) (Segarra et al, 2017 ), or looking at rare and unique phrases (Swaim, 2017 ).…”
Section: Introductionmentioning
confidence: 99%
“…Garrard et al (2005) were also instrumental in highlighting Alzheimer's disease through changes in writing and used a different approach which included some other elements of language (nouns, verbs, adverbs and adjectives and function words, e.g., conjunctions, and pronouns) to create word lists. As Arefin et al (2014) and Ferguson et al (2014) point out: the study of the subtle language changes over the lifespan of well-known writers (Lancashire, 2010), including Iris Murdoch and Agatha Christie (e.g. Garrard et al, 2005;Van Velzen & Garrard, 2008;Lancashire & Hirst, 2009;Le, 2010;Le et al, 2011) and political figures (Garrard, 2009) has highlighted that Alzheimer's disease may be apparent years or even decades before anyone becomes aware of any symptoms of cognitive deterioration.…”
Section: Introductionmentioning
confidence: 99%