The mosaic package provides a simplified and systematic introduction to the core functionality related to descriptive statistics, visualization, modeling, and simulation-based inference required in first and second courses in statistics. This introduction to the package describes some of the guiding principles behind the design of the package and provides illustrative examples of several of the most important functions it implements. These can be combined to help students "think with data" using R in their early course work, starting with simple, yet powerful, declarative commands.
MotivationIn order to make sense of rich data that is increasingly available, students need computational tools and facility for data management, exploratory analysis, visualization, and modeling (e.g., Nolan and Lang (2010);; Horton and Hardin (2015); Ridgway (2016)). To extract useful information from the complex systems that generate this rich data, students should learn to "think with data" (in the phrase coined by Diane Lambert of Google): using previous results from data to drive statistical investigations and to inform the choice of analysis and presentation possibilities.Yet many students enter statistics courses with little or no computational experience. Software such as R that encompass the tools for thinking with data are sometimes regarded as off-putting and inaccessible to students. With the mosaic package, we have sought to remove unnecessary difficulty in the use of R by students. Our students have demonstrated that it is feasible to integrate computing into our curricula early and often, in a way that provides students with success, confidence, and room to grow.The mosaic (Pruim et al., 2016b) package reflects attempts by each of the authors to make the power of R accessible and rewarding to students, especially in the context of undergraduate statistics courses, and, in one case, also in calculus. One of the guiding principles behind the development of the mosaic package has been "Less volume, more creativity." By "less volume," we mean reducing the cognitive load involved in using R. By "more creativity" we mean two things. First, we want to provide access to R tools in a way that fosters choices and decision making by students. Such decisions might be about modes of analysis or visualization or about the variables and covariates to consider when exploring relationships with data. Second, we want to encourage students to creatively engage with statistical inference methods via simulation techniques such as randomization and resampling with data analysis and presentation.Our route to "less volume" is to provide a set of three command templates -formulas, functions, and extractors -that standardize usage across many tasks and that highlight the connections between graphical summaries, numerical summaries, models, and inference. The templates themselves are designed to be consistent and concise. Consistency enables a student to generalize from a specific task to a wide set of possibilities. Conciseness means avoiding un...