2019 International Conference on Information Technology (ICIT) 2019
DOI: 10.1109/icit48102.2019.00089
|View full text |Cite
|
Sign up to set email alerts
|

Author Profiling: Prediction of Gender and Language Variety from Document

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 5 publications
0
4
0
Order By: Relevance
“…However, some papers considered common preprocessing techniques, similar to those used in PAN shared tasks [20]. At least five research groups represented in PAN's shared tasks from 2013 to 2017 removed retweet tags from the texts during preprocessing [13]- [17]; 17 groups removed hashtags [13]- [17], [20]; and 19 teams considered removing URLs. For the removal of the mentioned tags, 17research groups considered removing them from the processed text.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, some papers considered common preprocessing techniques, similar to those used in PAN shared tasks [20]. At least five research groups represented in PAN's shared tasks from 2013 to 2017 removed retweet tags from the texts during preprocessing [13]- [17]; 17 groups removed hashtags [13]- [17], [20]; and 19 teams considered removing URLs. For the removal of the mentioned tags, 17research groups considered removing them from the processed text.…”
Section: Related Workmentioning
confidence: 99%
“…Lundeqvist & Svensson removed HTML tags, and used Twitter custom tokenizer (nltk.tokenize package -NLTK 3.6.2 documentation) [19]. However, some papers considered common preprocessing techniques, similar to those used in PAN shared tasks [20]. At least five research groups represented in PAN's shared tasks from 2013 to 2017 removed retweet tags from the texts during preprocessing [13]- [17]; 17 groups removed hashtags [13]- [17], [20]; and 19 teams considered removing URLs.…”
Section: Related Workmentioning
confidence: 99%
“…The paper [4] describes the approach to classifying the messages in social networks, the various combinations of n-gram symbols and n-gram words at the level of parts of speech were investigated. As a result, this approach demonstrated the accuracy of 70%.…”
Section: Related Workmentioning
confidence: 99%
“…The stylometry is constantly being improved, although the universal methods leading to a reliable determination of authorship have not yet been developed for most tasks. Nevertheless, in recent years, the methods of stylometry have become actively used in the protection of information, as described in [1][2][3], and criminology [4].…”
Section: Introductionmentioning
confidence: 99%