A large-scale study on research code quality and execution

Trisovic, A.; Lau, Matthew K.; Pasquier, Thomas; Crosas, Mercè

doi:10.1038/s41597-022-01143-6

Cited by 75 publications

(75 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While the former clearly do not meet the eligibility requirements, we cannot judge the reproducibility, and thus the badge eligibility, of the latter at submission. We make several recommendations for how to improve the specific badge policy at Psychological Science and comparable initiatives at other journals (for further general recommendations on how to improve data sharing and computational reproducibility, see, e.g., Stodden et al, 2016;Trisovic et al, 2022;Wilson et al, 2017). Excellent and more in-depth recommendations and tutorials for authors to ensure that their shared data and code are eligible for an Open Data badge are provided by, for example, Arslan (2019), Eberle ( 2022 2021).…”

Section: Discussionmentioning

confidence: 99%

What’s in a Badge? A Computational Reproducibility Investigation of the Open Data Badge Policy in one Issue of Psychological Science

Crüwell¹,

Apthorp²,

Baker³

et al. 2022

Preprint

View full text Add to dashboard Cite

In April 2019, Psychological Science published its first issue in which all research articles received the Open Data badge. We used that issue to investigate the effectiveness of this badge, focusing on the adherence to its stated aim at Psychological Science: ensuring reproducibility of results. Twelve researchers of varying experience levels attempted to reproduce the results of the empirical articles in the target issue (at least three researchers per article). We found that all articles provided at least some data, 6/14 articles provided analysis code or scripts, only 1/14 articles was rated to be exactly reproducible, and 3/14 essentially reproducible with minor deviations. We recommend that Psychological Science require a check of reproducibility at the peer review stage before awarding badges, and that the Open Data badge be renamed “Open Data and Code” to avoid confusion and encourage researchers to adhere to this higher standard.

show abstract

Section: Discussionmentioning

confidence: 99%

What’s in a Badge? A Computational Reproducibility Investigation of the Open Data Badge Policy in one Issue of Psychological Science

Crüwell¹,

Apthorp²,

Baker³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…However, especially in structural equation modeling research, sharing syntax is essential to evaluate model specifications and decisions, as they are often unavailable upon request (Wicherts & Crompvoets, 2017). In addition, many minor adjustments, such as capturing research workflow or managing project library paths and dependencies, can significantly improve the quality of research code (Trisovic et al, 2022). On the other hand, it is up to journals to ensure that authors with an open data statement in their manuscript share their (intelligible) data.…”

Section: Discussionmentioning

confidence: 99%

The Dire Disregard of Measurement Invariance Testing in Psychological Science

D’Urso¹,

Maassen²,

Assen³

et al. 2022

Preprint

View full text Add to dashboard Cite

In psychological science, self-report scales are widely used to compare means in targeted latent constructs across time points, groups, or experimental conditions. For these scale mean comparisons (SMC) to be meaningful and unbiased, the scales should be measurement invariant across the compared time points or (experimental) groups. Measurement invariance (MI) testing checks whether the latent constructs are measured equivalently across groups or time points. Since MI is essential for meaningful comparisons, we conducted a systematic review to check whether MI is taken seriously in psychological research. Specifically, we sampled 426 psychology articles with openly available data that involved a total of 918 SMCs to (1) investigate common practices in conducting and reporting of MI testing, (2) check whether reported MI test results can be reproduced, and (3) conduct MI tests for the SMCs that enabled sufficiently powerful MI testing with the shared data. Our results indicate that (1) 4% of the 918 scales underwent MI testing across groups or time and that these tests were generally poorly reported, (2) none of the reported MI tests could be successfully reproduced, and (3) of 161 newly performed MI tests, a mere 46 (29%) reached sufficient MI (scalar invariance), and MI often failed completely (89; 55%). Thus, MI tests were rarely done and poorly reported in psychological studies, and the frequent violations of MI indicate that reported group differences cannot be solely attributed to group differences in the latent constructs. We offer recommendations on reporting MI tests and improving computational reproducibility practices.

show abstract

“…First, the literature review indicates that the source code is often unavailable, making it impossible to reproduce the study accurately. Second, the algorithms are published in pseudo-code, but its informality risks overlooked errors or new errors being introduced when translated to real programming languages [13,14]. Additionally, they were tested on a small number of datasets [15][16][17].…”

Section: Motivation and Significancementioning

confidence: 99%