Hypothesis tests are a crucial statistical tool for data mining and are the workhorse of scientific research in many fields. Here we study differentially private tests of independence between a categorical and a continuous variable. We take as our starting point traditional nonparametric tests, which require no distributional assumption (e.g., normality) about the data distribution. We present private analogues of the Kruskal-Wallis, Mann-Whitney, and Wilcoxon signed-rank tests, as well as the parametric one-sample t-test. These tests use novel test statistics developed specifically for the private setting. We compare our tests to prior work, both on parametric and nonparametric tests. We find that in all cases our new nonparametric tests achieve large improvements in statistical power, even when the assumptions of parametric tests are met.
infer implements an expressive grammar to perform statistical inference that adheres to the tidyverse design framework (Wickham et al., 2019). Rather than providing methods for specific statistical tests, this package consolidates the principles that are shared among common hypothesis tests and confidence intervals into a set of four main verbs (functions), supplemented with many utilities to visualize and extract value from their outputs.
1. Neighborhood competition models are powerful tools to measure the
effect of interspecific competition. Statistical methods to ease the
application of these models are currently lacking. 2. We present the
forestecology package providing methods to i) specify neighborhood
competition models, ii) evaluate the effect of competitor species
identity using permutation tests, and iii) measure model performance
using spatial cross-validation. Following Allen (2020), we implement a
Bayesian linear regression neighborhood competition model. 3. We
demonstrate the package’s functionality using data from the Smithsonian
Conservation Biology Institute’s large forest dynamics plot, part of the
ForestGEO global network of research sites. Given ForestGEO’s data
collection protocols and data formatting standards, the package was
designed with cross-site compatibility in mind. We highlight the
importance of spatial cross-validation when interpreting model results.
4. The package features i) tidyverse-like structure whereby verb-named
functions can be modularly “piped” in sequence, ii) functions with
standardized inputs/outputs of simple features ‘sf‘ package class, and
iii) an S3 object-oriented implementation of the Bayesian linear
regression model. These three facts allow for clear articulation of all
the steps in the sequence of analysis and easy wrangling and
visualization of the geospatial data. Furthermore, while the package
only has Bayesian linear regression implemented, the package was
designed with extensibility to other methods in mind.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.