2021
DOI: 10.1038/s41467-021-27222-7
|View full text |Cite
|
Sign up to set email alerts
|

Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks

Abstract: Each year vast international resources are wasted on irreproducible research. The scientific community has been slow to adopt standard software engineering practices, despite the increases in high-dimensional data, complexities of workflows, and computational environments. Here we show how scientific software applications can be created in a reproducible manner when simple design goals for reproducibility are met. We describe the implementation of a test server framework and 40 scientific benchmarks, covering … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

4
4

Authors

Journals

citations
Cited by 16 publications
(10 citation statements)
references
References 71 publications
0
9
0
Order By: Relevance
“…A fundamental problem is the fact that we do not have enough details to reproduce the experiment under 100% identical conditions, since these details were not included in the original publications, the actual code is not available, and there are contradictions between different publications by the same team. This is a common problem in computational research nowadays [38,39]. We had not anticipated the problem of reproducibility, otherwise we might have chosen a different method for calculating membrane insertion, but Brasseur's approach was appealing because of its simplicity and computational efficiency, which lends itself very well to interactive studies.…”
Section: Discussionmentioning
confidence: 99%
“…A fundamental problem is the fact that we do not have enough details to reproduce the experiment under 100% identical conditions, since these details were not included in the original publications, the actual code is not available, and there are contradictions between different publications by the same team. This is a common problem in computational research nowadays [38,39]. We had not anticipated the problem of reproducibility, otherwise we might have chosen a different method for calculating membrane insertion, but Brasseur's approach was appealing because of its simplicity and computational efficiency, which lends itself very well to interactive studies.…”
Section: Discussionmentioning
confidence: 99%
“…As a result, it is practical to iteratively run the tests to maintain reproducibility and for continuous optimization. For this purpose, the tests run continuously on the Rosetta Benchmark Server () and the source code are publicly distributed through GitHub () to make the tests accessible to all membrane protein modeling developers. We hope these resources will help the community share standardized metrics for evaluating membrane protein energy functions.…”
Section: Discussionmentioning
confidence: 99%
“…Therefore, protein coordinate translation was performed by structural superposition of the protein to its equivalent structure obtained from the Orientations of Proteins in Membranes (OPM) database ( Lomize et al, 2012 ), which lies already within those coordinates. Next, the membrane plane was calculated using Rosetta ( Alford et al, 2015 ; Koehler Leman et al, 2017b ) and the protein structure was relaxed as described in Koehler Leman et al (2021 ). Finally, ΔΔG values for each variant were calculated as the residual energy of the variant minus the energy of the wild type.…”
Section: Methodsmentioning
confidence: 99%
“…cart_prot follows the same protocol as MP_flex_relax_ddG but is executed in cartesian space . For all protocols, we used the membrane protein score function ‘franklin2019’ ( Alford et al, 2021 ) that performs comparable to older membrane scoring functions as recently evaluated in Koehler Leman et al (2021 ). Finally, we selected cart_prot as the computed values gave the best correlation with the experimental data (0.46), and additionally, the computed values have a high reproducibility, indicated by the low standard deviation for replicates (Fig.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation