2006
DOI: 10.1093/llc/fqm003
|View full text |Cite
|
Sign up to set email alerts
|

Employing Thematic Variables for Enhancing Classification Accuracy Within Author Discrimination Experiments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0
1

Year Published

2008
2008
2020
2020

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 18 publications
0
8
0
1
Order By: Relevance
“…They are syntactic, lexical and structural features. The various lexical features are content words, letter frequency, special characters , character n-grams, misspellings in [10,11], special use words in [12], 8 punctuation marks in [13], most frequent types [14], spelling errors, word form errors in [15], syntactically classified punctuation , syntactic structure [16], function word frequencies, POS trigrams or sequences of 3 in [17], word n-grams in [18], POS bigrams or sequences of 2 in [19], unigrams/types shared by training and testing samples in [20], 1024-character sequences in [21], content words, frequent words in [22], words bigrams or sequences in [23], emoticons, netabbrevs in [24], character n-grams or sequences in [25], non-function words in [26], frequency of lemmas (dictionary entry headwords), frequency of negative words in [27]. The various syntactic features are punctuation, function words [10], POS tags [11], syntactically classified punctuation, verbal phrases [15], syntactically classified punctuation [16], function word-token ratios [28], POS trigrams or sequences of 3 [17], punctuation frequency [18], POS bigrams or sequences of 2 [19], PCFG-obtained POS [26], syntactically classified punctuation [29], verbal phrases [30], phrase types, words per phrase type [31].…”
Section: E-mail Address: Nagkanna80@gmailcommentioning
confidence: 99%
“…They are syntactic, lexical and structural features. The various lexical features are content words, letter frequency, special characters , character n-grams, misspellings in [10,11], special use words in [12], 8 punctuation marks in [13], most frequent types [14], spelling errors, word form errors in [15], syntactically classified punctuation , syntactic structure [16], function word frequencies, POS trigrams or sequences of 3 in [17], word n-grams in [18], POS bigrams or sequences of 2 in [19], unigrams/types shared by training and testing samples in [20], 1024-character sequences in [21], content words, frequent words in [22], words bigrams or sequences in [23], emoticons, netabbrevs in [24], character n-grams or sequences in [25], non-function words in [26], frequency of lemmas (dictionary entry headwords), frequency of negative words in [27]. The various syntactic features are punctuation, function words [10], POS tags [11], syntactically classified punctuation, verbal phrases [15], syntactically classified punctuation [16], function word-token ratios [28], POS trigrams or sequences of 3 [17], punctuation frequency [18], POS bigrams or sequences of 2 [19], PCFG-obtained POS [26], syntactically classified punctuation [29], verbal phrases [30], phrase types, words per phrase type [31].…”
Section: E-mail Address: Nagkanna80@gmailcommentioning
confidence: 99%
“…Como esta lista es dependiente del corpus del que es extraída, el tamaño de la misma varió entre 1,402 tipos para el corpus más pequeño, hasta 13,089 para el corpus más grande. Cabe aclarar que la puntuación fue removida de las unidades léxicas a las que se unía y los signos separados fueron utilizados como unigramas léxicos independientes, un procedimiento común en la atribución de autoría [9], [20], [25] y [26]. En cuanto a los rasgos sintácticos, se utilizó una lista previamente recabada (para otra tarea clasificatoria) con elementos léxicos funcionales pluriverbales, es decir, con más de una palabra.…”
Section: Rasgos Clasificatorios De Autoríaunclassified
“…Generation of multiple clusterings. The notion that text collections may be clustered in multiple independent ways has been discussed in the literature on computational stylistics (see Lim, Lee, & Kim, 2005;Biber & Kurjian, 2006;Grieve-Smith, 2006;Tambouratzis & Vassiliou, 2007;Gries, Wulff, & Davies, 2010, for example). In machine learning, there have been attempts to design algorithms for producing multiple clusterings of a dataset.…”
Section: Related Workmentioning
confidence: 99%