2006
DOI: 10.1007/11890393_29
|View full text |Cite
|
Sign up to set email alerts
|

Chat Mining for Gender Prediction

Abstract: Abstract. The aim of this paper is to investigate the feasibility of predicting the gender of a text document's author using linguistic evidence. For this purpose, term-and style-based classification techniques are evaluated over a large collection of chat messages. Prediction accuracies up to 84.2% are achieved, illustrating the applicability of these techniques to gender prediction. Moreover, the reverse problem is exploited, and the effect of gender on the writing style is discussed.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
1
1

Year Published

2007
2007
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 46 publications
(27 citation statements)
references
References 13 publications
0
25
1
1
Order By: Relevance
“…Other studies of this nature include [5,29,95] with largely similar results, although on different languages, genres, and methods. For example, [95] studied Internet chat logs (e.g., IRC, IM, ICQ, and such) in Turkish using a feature set most notable for its inclusion of smileys and a variety of classifiers including neural networks, k-nearest neighbor, and naive Bayesian analysis.…”
Section: Gendermentioning
confidence: 84%
See 2 more Smart Citations
“…Other studies of this nature include [5,29,95] with largely similar results, although on different languages, genres, and methods. For example, [95] studied Internet chat logs (e.g., IRC, IM, ICQ, and such) in Turkish using a feature set most notable for its inclusion of smileys and a variety of classifiers including neural networks, k-nearest neighbor, and naive Bayesian analysis.…”
Section: Gendermentioning
confidence: 84%
“…For example, [95] studied Internet chat logs (e.g., IRC, IM, ICQ, and such) in Turkish using a feature set most notable for its inclusion of smileys and a variety of classifiers including neural networks, k-nearest neighbor, and naive Bayesian analysis. The results from Baysian analysis were again in the 80% range, with other methods slightly or significantly less.…”
Section: Gendermentioning
confidence: 99%
See 1 more Smart Citation
“…A wide range of learning methods has been applied to this purpose, such as k-Nearest Neighbor [10], Naive Bayes [11], Support Vector Machines [12], Voting [3], centroid classifier [9], etc.…”
Section: Related Workmentioning
confidence: 99%
“…Most existing work on predicting gender focuses on exploiting the linguistic production of the users (Koppel et al, 2003;Schler et al, 2006;Kucukyilmaz et al, 2006;Burger et al, 2011;Miller et al, 2012;Rangel et al, 2016), just rarely using nonlinguistic information such as metadata (Plank and Hovy, 2015) or visual information (Alowibdi et al, 2013).…”
Section: Introductionmentioning
confidence: 99%