2021
DOI: 10.1177/00811750211053370
|View full text |Cite
|
Sign up to set email alerts
|

Language Models in Sociological Research: An Application to Classifying Large Administrative Data and Measuring Religiosity

Abstract: Computational methods have become widespread in the social sciences, but probabilistic language models remain relatively underused. We introduce language models to a general social science readership. First, we offer an accessible explanation of language models, detailing how they estimate the probability of a piece of language, such as a word or sentence, on the basis of the linguistic context. Second, we apply language models in an illustrative analysis to demonstrate the mechanics of using these models in s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(14 citation statements)
references
References 60 publications
0
14
0
Order By: Relevance
“…For our second baseline, we follow Jensen et al (2021) who train language models to infer religiosity from Indonesian names. A language model computes the probabilities of n-grams from a training corpus.…”
Section: Baseline: Name2communitymentioning
confidence: 99%
“…For our second baseline, we follow Jensen et al (2021) who train language models to infer religiosity from Indonesian names. A language model computes the probabilities of n-grams from a training corpus.…”
Section: Baseline: Name2communitymentioning
confidence: 99%
“…Language models are designed to solve seemingly simple tasks like the prediction of a masked word in a sentence conditional on observed words surrounding it or the prediction of the next word in a left-to-right sequence (Jensen et al 2021). 6 When trained to perform such tasks on large corpora, the resulting models are able to account for extensive variation in word sequence structures, approximating the semantic content of a natural language.…”
Section: Measuring Political Frames In Textsmentioning
confidence: 99%
“…Nevertheless, people often name their children in ways that (consciously or unconsciously) signal gender, racial/ethnic, religious, and even class membership. 9,[15][16][17] Other times, they resist these associations by deliberately choosing ambiguous names for their children 15,16 , or by changing their own names to have different associations later in life. The aggregate result of these choices is an imperfect cultural consensus around the gendered, racialized, and other associations of many names.…”
Section: Why Are Demographics Correlated With Names?mentioning
confidence: 99%
“…Conduct inference specific to a population using domain expertise. Jensen et al 17 use their knowledge of the Indonesian regency of Indramayu, where the choice of Javanese, Indonesian (Bahasa), or Arabic names for children is a strong signal of religiosity, to develop a custom name-based religiosity inference model that works well in this setting, but would not translate to many other contexts. This method of combining qualitative and quantitative analysis is the inverse of Nelson's 49 computational grounded theory.…”
Section: Recommendationsmentioning
confidence: 99%