2023
DOI: 10.1017/pan.2023.6
|View full text |Cite
|
Sign up to set email alerts
|

It’s All in the Name: A Character-Based Approach to Infer Religion

Abstract: Large-scale microdata on group identity are critical for studies on identity politics and violence but remain largely unavailable for developing countries. We use personal names to infer religion in South Asia—where religion is a salient social division, and yet, disaggregated data on it are scarce. Existing work predicts religion using a dictionary-based method and, therefore, cannot classify unseen names. We provide character-based machine-learning models that can classify unseen names too with high accuracy… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
5
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 58 publications
1
5
0
Order By: Relevance
“…In Appendix B, we discuss how we select the sample of names for manual annotation and choose a more suitable classification threshold. Chaturvedi and Chaturvedi [13] report an 𝐹 1 score of 95% on their test set which reduces to around 85% when only one name part is available (i.e., only the first or last name)-as is often the case on Twitter; we find comparable values (sensitivity 84% and specificity 86.5%). The misclassifications lead to measurement error, potentially underestimating polarization; though qualitatively we expect the temporal patterns to remain the same.…”
Section: Inferring Religionsupporting
confidence: 47%
See 4 more Smart Citations
“…In Appendix B, we discuss how we select the sample of names for manual annotation and choose a more suitable classification threshold. Chaturvedi and Chaturvedi [13] report an 𝐹 1 score of 95% on their test set which reduces to around 85% when only one name part is available (i.e., only the first or last name)-as is often the case on Twitter; we find comparable values (sensitivity 84% and specificity 86.5%). The misclassifications lead to measurement error, potentially underestimating polarization; though qualitatively we expect the temporal patterns to remain the same.…”
Section: Inferring Religionsupporting
confidence: 47%
“…Since the data on religious identity is not available on Twitter, we infer this from usernames. In India, names are highly predictive of group identity and we leverage character sequence-based machine learning models from our earlier work [13] to obtain religion estimates. 4 Previous research uses these to examine disparities in allocation of publicly provided goods [14], impact of government surveillance on minority voter turnout [2], and potential election irregularities [16].…”
Section: Inferring Religionmentioning
confidence: 99%
See 3 more Smart Citations