Companion Proceedings of the Web Conference 2021 2021
DOI: 10.1145/3442442.3452350
|View full text |Cite
|
Sign up to set email alerts
|

Inferring Sociodemographic Attributes of Wikipedia Editors: State-of-the-art and Implications for Editor Privacy

Abstract: In this paper, we investigate the state-of-the-art of machine learning models to infer sociodemographic attributes of Wikipedia editors based on their public profile pages and corresponding implications for editor privacy. To build models for inferring sociodemographic attributes, ground truth labels are obtained via different strategies, using publicly disclosed information from editor profile pages. Different embedding techniques are used to derive features from editors' profile texts. In comparative evaluat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 20 publications
0
3
0
Order By: Relevance
“…We acknowledge the Master’s thesis by Brückner ( 71 ), which identified a potential pattern in data and provided an inspiration for the design of the study presented in this paper. The initial data collection and experiments were carried out as part of Camelia Oprea’s Master’s thesis ( 72 ).…”
Section: Acknowledgmentsmentioning
confidence: 99%
“…We acknowledge the Master’s thesis by Brückner ( 71 ), which identified a potential pattern in data and provided an inspiration for the design of the study presented in this paper. The initial data collection and experiments were carried out as part of Camelia Oprea’s Master’s thesis ( 72 ).…”
Section: Acknowledgmentsmentioning
confidence: 99%
“…Next, we analyze the similarity between the usernames of the parent and child accounts. For this, we compute the normalized Levenshtein distance between the two usernames 6 . We find that, on average, the usernames of accounts within an evasion pair are more similar when compared the matched pairs (0.86 vs. 0.91, 𝑝 < 0.001); see Figure 2c.…”
Section: Non-evading Malicious Accountsmentioning
confidence: 99%
“…To this end, we analyze the behavioral attributes of ban evaders and develop methods to detect and predict ban evasion based on these attributes, while not relying on sensitive information that is only available for specific platforms. 1 Platforms like Wikipedia do not enforce sharing personal details because, for a few contributors, having their "real life" identity discovered can threaten their "well-being, careers, or even personal safety" [6,45].…”
Section: Introductionmentioning
confidence: 99%