The separation issue is a complication frequently occurring when sparse data are analysed by logistic regression models. Such analyses of sparse datasets tend to be biased, resulting in misleading conclusions, or may even not be feasible, due to computational problems such as non-convergence. In this report, a systematic literature review (SLR) was carried out to describe the phenomenon as well as the methods that deal with the separation issue in binary response models, especially in logistic regression type of models. Methods dealing with separation are categorized regarding the statistical paradigm and framework in which they are proposed and studied: Likelihood, Bayesian, GEE and cluster specific/random effects approach. Disclaimer: The present document has been produced and adopted by the bodies identified above as author(s). This task has been carried out exclusively by the author(s) in the context of a contract between the European Food Safety Authority and the author(s), awarded following a tender procedure. The present document is published complying with the transparency principle to which the Authority is subject. It may not be considered as an output adopted by the Authority. The European Food Safety Authority reserves its rights, view and position as regards the issues addressed and the conclusions reached in the present document, without prejudice to the rights of the authors.
Acknowledgements:The authors would like to thank the EFSA staff member José Cortiñas Abrahantes for the support provided to this scientific output.
SummaryThe logistic regression model is widely used as the standard statistical approach to study the influence of determinant factors on the presence or absence of a hazardous event on any organism or a particular safety status of foods or feeding stuffs. A common statistical problem that occurs in the application of logistic models to analyse binary response data is the separation problem, which if not dealt with appropriately, may compromise the inference process and may result in biased conclusions, e.g. with regards to the relevance of a particular factor on the presence or absence of an effect on an organism (e.g. conclusions might indicate no effect while the factor is highly influencing the outcome under study, producing erroneous inferences and conclusions). Thus, sparse datasets analysed by logistic regression models need to get more attention. In order to gain more information about the effects of separation as well as the methods that deal with the issue, a systematic literature review (SLR) was carried out. The SLR has been done based on the review protocol. The methods and approaches dealing with separation are categorized according to the statistical paradigm or framework they have been proposed and studied: Likelihood, Bayesian, GEE and cluster specific/random effects approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.