2022
DOI: 10.1007/978-3-031-04572-1_12
|View full text |Cite
|
Sign up to set email alerts
|

Predicting Human Psychometric Properties Using Computational Language Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(13 citation statements)
references
References 32 publications
1
6
0
Order By: Relevance
“…The interpretation of our results relies on the assumption that more accurate LMs provide better estimators of human surprisal, at least for those words which drive the superlinear fit of our GAMs. As discussed above, this assumption is supported by recent literature (Goodkind & Bicknell, 2018 ; Hao et al, 2020 ; Laverghetta et al, 2022 ; Merkx & Frank, 2021 ; Wilcox et al, 2020 ). Very recently, however, another line of work has emerged arguing that, to the contrary, lower perplexity LMs sometimes provide poorer fits to psychometric data.…”
Section: Discussionsupporting
confidence: 63%
See 1 more Smart Citation
“…The interpretation of our results relies on the assumption that more accurate LMs provide better estimators of human surprisal, at least for those words which drive the superlinear fit of our GAMs. As discussed above, this assumption is supported by recent literature (Goodkind & Bicknell, 2018 ; Hao et al, 2020 ; Laverghetta et al, 2022 ; Merkx & Frank, 2021 ; Wilcox et al, 2020 ). Very recently, however, another line of work has emerged arguing that, to the contrary, lower perplexity LMs sometimes provide poorer fits to psychometric data.…”
Section: Discussionsupporting
confidence: 63%
“…While questions remain about the similarity between even the best modern LM’s predictions and those of humans, numerous studies in this area have found that higher quality LMs (those better able to predict test data) make better predictors of processing difficulty (Frank, 2009 ; Fossum & Levy, 2012 ; Goodkind & Bicknell, 2018 ; Wilcox et al, 2020 ). 7 Additionally, recent work comparing architectures has found that surprisal estimates from Transformer-based LMs (Vaswani et al, 2017 ) tend to be the best predictors of psychometric measures (Hao et al, 2020 ; Merkx & Frank, 2021 ; Laverghetta et al, 2022 ). 8 Only one recent published study—(Wilcox et al, 2020 )—has fit nonlinear GAMs of the linking function using surprisals from a modern Transformer-based LM (GPT-2 Radford et al, 2019 ).…”
Section: Surprisal Theorymentioning
confidence: 99%
“…A number of studies have observed that higher quality language models (those better able to predict test data) make better predictors of processing difficulty (Frank, 2009;Fossum and Levy, 2012;Goodkind and Bicknell, 2018;Wilcox et al, 2020). Additionally, recent work comparing LM architectures has found that transformer-based LMs (Vaswani et al, 2017) tend to be the best predictors of psychometric measures (Hao et al, 2020;Merkx and Frank, 2021;Laverghetta et al, 2022). 7 Only one recent study-Wilcox et al ( 2020)-has fit nonlinear models of the linking function using surprisals from a modern transformer-based LM (GPT-2 Radford et al, 2019).…”
Section: Empirical Studies In Surprisal Theorymentioning
confidence: 99%
“…For instance, earlier generation automated item generation (AIG) tools (e.g., Gierl & Haladyna, 2012) required test developers to construct item blueprints with articulated factors from prespecified cognitive models. Large language models (LLMs), part of the ML toolkit, can assist test developers generate hundreds of thousands of items quickly, at scale, and with a large degree of variability (e.g., Belzak et al., 2023; Bezirhan & von Davier, 2023; Hommel et al., 2021; Laverghetta & Licato, 2023; von Davier, 2018). These items can be administered to candidates on large‐scale standardized tests, leading to high‐dimensional datasets with large degrees of sparsity across items of various response formats.…”
Section: Figurementioning
confidence: 99%