Biological data are often intrinsically hierarchical (e.g., species from different genera, plants within different mountain regions), which made mixed‐effects models a common analysis tool in ecology and evolution because they can account for the non‐independence. Many questions around their practical applications are solved but one is still debated: Should we treat a grouping variable with a low number of levels as a random or fixed effect? In such situations, the variance estimate of the random effect can be imprecise, but it is unknown if this affects statistical power and type I error rates of the fixed effects of interest. Here, we analyzed the consequences of treating a grouping variable with 2–8 levels as fixed or random effect in correctly specified and alternative models (under‐ or overparametrized models). We calculated type I error rates and statistical power for all‐model specifications and quantified the influences of study design on these quantities. We found no influence of model choice on type I error rate and power on the population‐level effect (slope) for random intercept‐only models. However, with varying intercepts and slopes in the data‐generating process, using a random slope and intercept model, and switching to a fixed‐effects model, in case of a singular fit, avoids overconfidence in the results. Additionally, the number and difference between levels strongly influences power and type I error. We conclude that inferring the correct random‐effect structure is of great importance to obtain correct type I error rates. We encourage to start with a mixed‐effects model independent of the number of levels in the grouping variable and switch to a fixed‐effects model only in case of a singular fit. With these recommendations, we allow for more informative choices about study design and data analysis and make ecological inference with mixed‐effects models more robust for small number of levels.
Ecologists increasingly rely on complex computer simulations to forecast ecological systems. To make such forecasts precise, uncertainties in model parameters and structure must be reduced and correctly propagated to model outputs. Naively using standard statistical techniques for this task, however, can lead to bias and underestimation of uncertainties in parameters and predictions. Here, we explain why these problems occur and propose a framework for robust inference with complex computer simulations. After having identified that model error is more consequential in complex computer simulations, due to their more pronounced nonlinearity and interconnectedness, we discuss as possible solutions data rebalancing and adding bias corrections on model outputs or processes during or after the calibration procedure. We illustrate the methods in a case study, using a dynamic vegetation model. We conclude that developing better methods for robust inference of complex computer simulations is vital for generating reliable predictions of ecosystem responses.
Calibrating process‐based models using multiple constraints often improves the identifiability of model parameters, helps to avoid several errors compensating each other and produces model predictions that are more consistent with underlying processes. However, using multiple constraints can lead to predictions for some variables getting worse. This is particularly common when combining data sources with very different sample sizes. Such unbalanced model‐data fusion efforts are becoming increasingly common, for example when combining manual and automated measurements. Here we use a series of simulated virtual data experiments that aim to demonstrate and disentangle the underlying cause of issues that can occur when calibrating models with multiple unbalanced constraints in combination with systematic errors in models and data. We propose a diagnostic tool to help identify whether a calibration is failing due to these factors. We also test the utility of adding terms representing uncertainty in systematic model/data systematic error in calibrations. We show that unbalanced data by itself is not the problem—when fitting simulated data to the ‘true’ model, we can correctly recover model parameters and the true dynamics of latent variables. However, when there are systematic errors in the model or the data, we cannot recover the correct parameters. Consequently, the modelled dynamics of the low data volume variables departs significantly from the true values. We demonstrate the utility of the diagnostic tool and show that it can also be used to identify the extent of the imbalance before the calibration starts to ignore the more sparse data. Finally, we show that representing uncertainty in model structural errors and data biases in the calibration can greatly improve the model fit to low‐volume data, and improve coverage of uncertainty estimates. We conclude that the underlying issue is not one of sample size or information content per se, despite the popularity of ad hoc approaches that focus on ‘weighting’ datasets to achieve balance. Our results emphasize the importance of considering model structural deficiencies and data systematic biases in the calibration of process‐based models.
Biological data are often intrinsically hierarchical. Due to their ability to account for such dependencies, mixed-effect models have become a common analysis technique in ecology and evolution. While many questions around their theoretical foundations and practical applications are solved, one fundamental question is still highly debated: When having a low number of levels should we model a grouping variable as a random or fixed effect? In such situation, the variance of the random effect is presumably underestimated, but whether this affects the statistical properties of the fixed effects is unclear. Here, we analyze the consequences of including a grouping variable as fixed or random effect and possible other modeling options (over and underspecified models) for data with small number of levels in the grouping variable (2 - 8). For all models, we calculated type I error rates, power and coverage. Moreover, we show the influence of possible study designs on these statistical properties. We found that mixed-effect models already for two groups correctly estimate variance for two groups. Moreover, model choice does not influence the statistical properties when there is no random slope in the data-generating process. However, if an ecological effect differs among groups, using a random slope and intercept model, and switching to a fixed-effect model only in case of a singular fit avoids overconfidence in the results. Additionally, power and type I error are strongly influenced by the number of and the difference between groups. We conclude that inferring the correct random effect structure is of high importance to get correct statistical properties. When in doubt, we recommend starting with the simpler model and using model diagnostics to identify missing components. When having identified the correct structure, we encourage to start with a mixed-effects model independent of the number of groups and only in case of a singular fit switch to a fixed-effect model. With these recommendations, we allow for more informative choices about study design and data analysis and thus make ecological inference with mixed-effects models more robust for low number of groups.
1. Current modelling approaches to predict spatially explicit biodiversity responses to climate change mainly focus on the direct effects of climate on species. Integration of spatiotemporal land-cover scenarios is still limited. Current approaches either regard land cover as constant boundary conditions, or rely on general, typically globally defined land-use scenarios. This is problematic as it disregards the complex synergistic effects of climate and land use on biodiversity at the regional scale, as biophysical, economic, and social issues important for regional land-use decisions are also affected by climate change. To realistically predict climate impacts on biodiversity, it is therefore necessary to consider both, the direct effect of climate change on biodiversity, and its indirect effect on biodiversity via land-use change. 2. In this review and perspective paper, we outline how biodiversity models could be better integrated with regional, climate-driven land-use models. We provide an overview of empirical and modelling approaches to both land-use (LU) and biodiversity (BD) change, focusing on how integration has been attempted. We then analyse how LU and BD model properties, such as scales, inputs, and outputs, can be matched and identify potential integration challenges and opportunities. 3. We found LU integration in BD models has been frequently attempted. By contrast, integrating the role of BD in models of LU decisions is largely lacking. As a result, bi-directional effects remain largely understudied. Only few integrated LU-BD socio-ecological models have assessed climate change effects on LU and no study has yet investigated the relative contribution of direct vs. indirect effects of climate change on BD. 4. There is a large potential for model integration given the overlap on spatial scales, although challenges remain with respect to spatial scale, temporal dynamics, investigation of indirect effects, and bi-directionality, including feeding back to climate models. Efforts to better understand human decisions, eco-evolutionary dynamics, connection between terrestrial and aquatic systems, and format standardization of modelling outputs and empirical data should improve future models. Integrating biodiversity feedbacks into land-use and climate models requires modelling innovations, but should be feasible.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.