A systematic and reproducible “workflow”—the process that moves a scientific investigation from raw data to coherent research question to insightful contribution—should be a fundamental part of academic data-intensive research practice. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining 3 phases: the Explore, Refine, and Produce Phases. Each phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately communicated. Importantly, each phase can also give rise to a number of research products beyond traditional academic publications. Where relevant, we draw analogies between design principles and established practice in software development. The guidance provided here is not intended to be a strict rulebook; rather, the suggestions for practices and tools to advance reproducible, sound data-intensive analysis may furnish support for both students new to research and current researchers who are new to data-intensive work.
Small datasets comprising observations made under conditions of repeatability or of reproducibility pervade the practice of measurement science. Many laboratories typically will make only one determination, occasionally they will make two, and only rarely will they make three or more replicate determinations of the same measurand. Interlaboratory comparisons, including key comparisons, and meta-analyses, often involve only a handful of participants. These limitations pose considerable challenges to the production of reliable uncertainty evaluations. This contribution, intended for metrologists, describes techniques that may be employed to address this challenge either when the only information in hand is what those few observations provide, or when there also is preexisting knowledge about the measurement procedure or about the measurand. Although the technical details vary, the key message is persistently the same: that there is no universal solution to the challenges raised by small datasets, and that if a measurand is worth measuring, then the observations deserve a customized treatment responsive to the peculiarities of the case, and a level of effort sufficient to render the final result fit for its intended purpose. The focus is on the measurement of scalar measurands, similarly to the Guide to the Expression of Uncertainty in Measurement (GUM), but the range of measurement models considered is much wider than the GUM entertains. We review the advantages of the Hodges–Lehmann estimator, as a general purpose replacement for the arithmetic average, in all cases where the replicated observations are approximately symmetrically distributed around a central, typical value. We illustrate the application of empirical Bayes methods to uncertainty evaluations, in particular in the context of data reductions of small data sets. Metrologists who are skeptical about the use of subjective prior distributions may derive some value from this novel application, and thereby develop an appreciation for how Bayesian procedures can help address the challenges posed by small datasets. The estimates of the measurand that different approaches produce often agree, at least approximately, but the corresponding uncertainty quantifications may differ markedly. In one example, involving three observations, a Bayesian approach yields a coverage interval appreciably narrower than the GUM’s approach. In another example, involving only two observations, an approach involving far less restrictive assumptions than those made in the GUM, produces a confidence interval that is almost as narrow as the conventional interval.
Significance Conservation outreach has long depended on an intuitive sense of which species are more “charismatic” or engaging, for example, placing focus on certain charismatic megafauna in advertising materials. Online community science databases like eBird and iNaturalist provide records of how people engage with different birds under differing data collection protocols. Comparisons between the two databases reveal biases in bird reporting rates. Larger, more colorful, and rarer birds are preferentially engaged with opportunistically in iNaturalist records compared to more systematic eBird records. These relationships and the species-specific engagement indexes determined from these data can be applied to conservation and outreach efforts to help foster a public relationship with nature and can be used to improve models using these two databases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.