2002
DOI: 10.1093/llc/17.4.401
|View full text |Cite
|
Sign up to set email alerts
|

Automatically Categorizing Written Texts by Author Gender

Abstract: The problem of automatically determining the gender of a document's author would appear to be a more subtle problem than those of categorization by topic or authorship attribution. Nevertheless, it is shown that automated text categorization techniques can exploit combinations of simple lexical and syntactic features to infer the gender of the author of an unseen formal written document with approximately 80% accuracy. The same techniques can be used to determine if a document is fiction or non-fiction with ap… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
299
1
10

Year Published

2005
2005
2020
2020

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 462 publications
(324 citation statements)
references
References 31 publications
3
299
1
10
Order By: Relevance
“…More recently, Graham et al (2005) and Zheng et al (2006) used neural networks on a wide variety of features. Other studies used k-nearest neighbor (Kjell et al 1995;Hoorn et al 1999;Zhao & Zobel 2005), Naive Bayes (Kjell 1994a;Hoorn et al 1999;Peng et al 2004), rule learners (Holmes & Forsyth 1995;Holmes 1998;Argamon et al 1998;Koppel & Schler 2003;Abbasi & Chen 2005;Zheng et al 2006), support vector machines (De Vel et al 2001;Diederich et al 2003;Koppel & Schler 2003, Abbasi & Chen 2005Koppel et al 2005;Zheng et al 2006), Winnow (Koppel et al 2002;Argamon et al 2003;Koppel et al 2006a), and Bayesian regression Madigan et al 2006;Argamon et al 2008). Further details regarding these studies can be found in the Appendix.…”
Section: Machine Learning Approachmentioning
confidence: 99%
See 2 more Smart Citations
“…More recently, Graham et al (2005) and Zheng et al (2006) used neural networks on a wide variety of features. Other studies used k-nearest neighbor (Kjell et al 1995;Hoorn et al 1999;Zhao & Zobel 2005), Naive Bayes (Kjell 1994a;Hoorn et al 1999;Peng et al 2004), rule learners (Holmes & Forsyth 1995;Holmes 1998;Argamon et al 1998;Koppel & Schler 2003;Abbasi & Chen 2005;Zheng et al 2006), support vector machines (De Vel et al 2001;Diederich et al 2003;Koppel & Schler 2003, Abbasi & Chen 2005Koppel et al 2005;Zheng et al 2006), Winnow (Koppel et al 2002;Argamon et al 2003;Koppel et al 2006a), and Bayesian regression Madigan et al 2006;Argamon et al 2008). Further details regarding these studies can be found in the Appendix.…”
Section: Machine Learning Approachmentioning
confidence: 99%
“…A number of studies used the output of syntactic text chunkers and parsers to create features, and found that they could considerably improve results based on traditional word based analysis alone (Baayen et al 1996;Stamatatos et al 2000Stamatatos et al , 2001Gamon 2004;van Halteren 2004;Chaski 2005;Uzuner and Katz 2005;Hirst & Feiguina 2007). Many studies have used the frequencies of short sequences of parts-of-speech (or combinations of parts-of-speech and other classes of words) as a simple method for approximating syntactic features for this purpose (Argamon-Engelson et al 1998;Kukushkina et al 2001;De Vel et al 2001;Koppel et al 2002;Koppel & Schler 2003;Chaski 2005;Koppel et al 2005Koppel et al , 2006avan Halteren et al 2005;Zhao et al 2006;Zheng et al 2006).…”
Section: Syntax and Parts-of-speechmentioning
confidence: 99%
See 1 more Smart Citation
“…The parameters used in the questionnaire (translated from Swedish), categorized into five groups and ordered as opposites with stereotypic feminine traits (to the left) vs. stereotypic masculine traits (to the right). The order also corresponds to the order used in Table 2 In addition to the parameters 'intelligence' and 'persuasiveness' used in [22] we included twelve additional parameters (se above) for which peoples' conceptions and attitudes are known to relate to gender stereotypes [1,5,15]. Thus, the question was whether our manipulations of degree of femininity/masculinity via visual cues would be reflected in evaluations of the synthetic characters with respect to the gender stereotypical traits listed in Table 1.…”
Section: Issues Addressed In the Studymentioning
confidence: 99%
“…The original text was taken from a Swedish popular science magazine [21], and was adapted in order to be suitable for oral presentation. Furthermore, the text was modified in order to obtain equivalence in the two parts in terms of length, number of facts and of linguistic style, especially with respect to differences in male versus female linguistic styles [2,15]. The script parts were then pre-tested and validated in the sense that a number of readers and listeners could not decide whether they were written by a man or a woman.…”
Section: Scriptmentioning
confidence: 99%