2020
DOI: 10.31234/osf.io/e437b
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Putting psychology to the test: Rethinking model evaluation through benchmarking and prediction

Abstract: Consensus on standards for evaluating models and theories is an integral part of every science. Nonetheless, in psychology, relatively little focus has been placed on defining reliable communal metrics to assess model performance. Evaluation practices are often idiosyncratic, and are affected by a number of shortcomings (e.g., failure to assess models' ability to generalize to unseen data) that make it difficult to discriminate between good and bad models. Drawing inspiration from fields like machine learning … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(12 citation statements)
references
References 101 publications
(116 reference statements)
0
12
0
Order By: Relevance
“…Instead a broad (best representative) sampling from the item universe is necessary to abstain from trimming and artificially homogenizing item samples which would result in oversimplified models. These two views in dealing with item uniqueness are also addressed by Rocca and Yarkoni (2020), who distinguished between a model evaluation that follows an explanatory tradition and those that follows a prediction-oriented approach. That is, instead of developing models that try to explain the underlying processes (i.e., data modeling), researchers could also evaluate models based on their ability to predict a specific outcome (i.e., algorithmic modeling), regardless of how this aim is achieved.…”
Section: The Present Studymentioning
confidence: 99%
“…Instead a broad (best representative) sampling from the item universe is necessary to abstain from trimming and artificially homogenizing item samples which would result in oversimplified models. These two views in dealing with item uniqueness are also addressed by Rocca and Yarkoni (2020), who distinguished between a model evaluation that follows an explanatory tradition and those that follows a prediction-oriented approach. That is, instead of developing models that try to explain the underlying processes (i.e., data modeling), researchers could also evaluate models based on their ability to predict a specific outcome (i.e., algorithmic modeling), regardless of how this aim is achieved.…”
Section: The Present Studymentioning
confidence: 99%
“…Of the 55 participants included in DMCC55B, 31 were previously participants in the HCP Young Adult study (Van Essen et al 2012), enabling researchers with HCP data access to combine data from the two studies. The DMCC55B can also be used as a benchmark dataset (Varoquaux 2018;Willems, Nastase, and Milivojevic 2020;Rocca and Yarkoni 2020), particularly for methodological investigations focused on higher cognitive brain functions or regions.…”
Section: Background and Summarymentioning
confidence: 99%
“…Another common drawback of traditional attrition modeling approaches is that it is unclear whether theirs results are generalizable. The ability of a model to provide accurate and generalizable predictions is especially essential in applied research (Rocca & Yarkoni, 2020;Shmueli, 2010) such as study retention. To enable panel administrators to employ effective retention strategies (e.g., person-specific incentives at future waves), a prediction model also has to hold in future waves.…”
Section: Drawbacks Of Common Approaches To Analyzing Panel Attritionmentioning
confidence: 99%