2013
DOI: 10.1093/llc/fqt028
|View full text |Cite
|
Sign up to set email alerts
|

Language chunking, data sparseness, and the value of a long marker list: explorations with word n-grams and authorial attribution

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 31 publications
(4 citation statements)
references
References 13 publications
0
4
0
Order By: Relevance
“…Zeta was developed by Burrows (2007) and has been used in a range of investigations comparing authorial styles for attribution (e.g. Antonia et al, 2014), as well to investigate chronological style changes in the work of a single author, such as Hoover’s study of Henry James (Hoover, 2007). In the present study, zeta is used to confirm the temporal groupings suggested by the exploratory results obtained from the PCA and cluster analyses.…”
Section: Methodsmentioning
confidence: 99%
“…Zeta was developed by Burrows (2007) and has been used in a range of investigations comparing authorial styles for attribution (e.g. Antonia et al, 2014), as well to investigate chronological style changes in the work of a single author, such as Hoover’s study of Henry James (Hoover, 2007). In the present study, zeta is used to confirm the temporal groupings suggested by the exploratory results obtained from the PCA and cluster analyses.…”
Section: Methodsmentioning
confidence: 99%
“…1) each MELD text attests to a unique set of character n-grams 2) such unique sets are more similar among texts that share a similar language variant The use of words or characters as the basis of n-grams in authorship attribution has been discussed widely in earlier literature, (see e.g. [Hoover 2002[Hoover , 2003[Hoover , 2012, [Koppel et al, 2009], [Stamatatos, 2009], [Eder, 2011], and[Alexis et al, 2014]). The use of character n-grams means that the units of analysis have very little to do with the linguistic units of syllables, morphemes, and words ( [Eder 2015]), i.e.…”
Section: Meld and Analysis Of Datamentioning
confidence: 99%
“…Traditional authorship studies use word-form and part-of-speech n-grams (Stamatatos, 2009;Grieve, 2007, for a range of feature types). While these features have been shown to be effective, there is a constant trade-off required between several factors (Antonia et al, 2013). Shorter n-grams (e.g.…”
Section: Featuresmentioning
confidence: 99%