Data, code, and workflows should be available and cited
A key component of scientific communication is sufficient information for other researchers in the field to reproduce published findings. For computational and data-enabled research, this has often been interpreted to mean making available the raw data from which results were generated, the computer code that generated the findings, and any additional information needed such as workflows and input parameters. Many journals are revising author guidelines to include data and code availability. This work evaluates the effectiveness of journal policy that requires the data and code necessary for reproducibility be made available postpublication by the authors upon request. We assess the effectiveness of such a policy by () requesting data and code from authors and () attempting replication of the published findings. We chose a random sample of 204 scientific papers published in the journal after the implementation of their policy in February 2011. We found that we were able to obtain artifacts from 44% of our sample and were able to reproduce the findings for 26%. We find this policy-author remission of data and code postpublication upon request-an improvement over no policy, but currently insufficient for reproducibility.
CopubliShed by the ieee CS and the aip R e p r o d u c i b l e r e s e a r c h M assive computation is transforming science, as researchers from numerous fields launch ambitious projects involving large-scale computations. Emblems of our age include data mining for subtle patterns in vast data-• bases; and massive simulations of a physical system's com-• plete evolution repeated numerous times, as simulation parameters vary systematically.The traditional image of the scientist as a solitary person working in a laboratory with beakers and test tubes is long obsolete. The more accurate image-not yet well recognized-depicts a computer jockey working at all hours to launch experiments on computer servers. In fact, today's academic scientist likely has more in common with a large corporation's information technology manager than with a philosophy or English professor at the same university. A rapid transition is now under way-visible particularly over the past two decades-that will finish with computation as absolutely central to scientific enterprise. However, the transition is very far from completion. In fact, we believe that the dominant mode of scientific computing has already brought us to a state of crisis. The prevalence of very relaxed attitudes about communicating experimental details and validating results is causing a large and growing credibility gap. It's impossible to verify most of the results that computational scientists present at conferences and in papers. The crisisTo understand our claim, and the necessary response, we must look at the scientific process more broadly. Originally, there were two scientific methodological branches-deductive (for example, mathematics) and empirical (for example, statistical data analysis of controlled experiments). Many scientists accept computation (for example, large-scale simulation) as the third branch-some believe this shift has already occurred, as one can see in grant proposals, keynote speeches, and newsletter editorials. However, while computation is already
Journal policy on research data and code availability is an important part of the ongoing shift toward publishing reproducible computational science. This article extends the literature by studying journal data sharing policies by year (for both 2011 and 2012) for a referent set of 170 journals. We make a further contribution by evaluating code sharing policies, supplemental materials policies, and open access status for these 170 journals for each of 2011 and 2012. We build a predictive model of open data and code policy adoption as a function of impact factor and publisher and find higher impact journals more likely to have open data and code policies and scientific societies more likely to have open data and code policies than commercial publishers. We also find open data policies tend to lead open code policies, and we find no relationship between open data and code policies and either supplemental material policies or open access journal status. Of the journals in this study, 38% had a data policy, 22% had a code policy, and 66% had a supplemental materials policy as of June 2012. This reflects a striking one year increase of 16% in the number of data policies, a 30% increase in code policies, and a 7% increase in the number of supplemental materials policies. We introduce a new dataset to the community that categorizes data and code sharing, supplemental materials, and open access policies in 2011 and 2012 for these 170 journals.
We describe multiscale representations for data observed on equispaced grids and taking values in manifolds such as the sphere S 2 , the special orthogonal group SO(3), the positive definite matrices SP D(n), and the Grassmann manifolds G(n, k). The representations are based on the deployment of Deslauriers-Dubuc and average-interpolating pyramids "in the tangent plane" of such manifolds, using the Exp and Log maps of those manifolds. The representations provide "wavelet coefficients" which can be thresholded, quantized, and scaled in much the same way as traditional wavelet coefficients. Tasks such as compression, noise removal, contrast enhancement, and stochastic simulation are facilitated by this representation. The approach applies to general manifolds but is particularly suited to the manifolds we consider, i.e., Riemannian symmetric spaces, such as S n−1 , SO(n), G(n, k), where the Exp and Log maps are effectively computable. Applications to manifold-valued data sources of a geometric nature (motion, orientation, diffusion) seem particularly immediate. A software toolbox, SymmLab, can reproduce the results discussed in this paper.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.