2020
DOI: 10.31219/osf.io/ux9et
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Too many zeros and/or highly skewed? A tutorial on modelling health behaviour as count data with Poisson and negative binomial regression

Abstract:

Dependent variables in health psychology can be very strongly skewed, and/or contain large numbers of zeros as well as extreme outliers. For example, “How many cigarettes do you smoke on an average day?” The modal answer may be zero, but may range from 0 to 40+. The same can be true for minutes of moderate to vigorous physical activity. For some people this may be near zero, but take on extreme values for someone training for a marathon. The measures could be counts of behaviour or number of engagements wit… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 29 publications
0
5
0
Order By: Relevance
“…Next, the distribution of the outcome variable was checked, and found to be best described by a negative binomial distribution (Supplementary material Figure 1), which is a discrete probability distribution with lower bound at 0, and variance much larger than mean, suggesting the presence of overdispersion. 38 To avoid losing information by dichotomizing the data or using non-parametric tests, a generalized linear model (GLM) for a negative binomial distribution with a log link function was chosen for data analysis. Due to the data structure (providers nested in PHCCs and PHCCs within country), generalized linear mixed models were initially used to test for the inclusion of random effects.…”
Section: Discussionmentioning
confidence: 99%
“…Next, the distribution of the outcome variable was checked, and found to be best described by a negative binomial distribution (Supplementary material Figure 1), which is a discrete probability distribution with lower bound at 0, and variance much larger than mean, suggesting the presence of overdispersion. 38 To avoid losing information by dichotomizing the data or using non-parametric tests, a generalized linear model (GLM) for a negative binomial distribution with a log link function was chosen for data analysis. Due to the data structure (providers nested in PHCCs and PHCCs within country), generalized linear mixed models were initially used to test for the inclusion of random effects.…”
Section: Discussionmentioning
confidence: 99%
“…To account for this, we used negative binomial regression models. These models are commonly used for count data, but as the sMFQ scores are discrete, independent, and have no negative values, models using count distributions are still applicable (Green, 2020;Kandola et al, 2020). The outcome for these models is interpretable as a percentage change in sMFQ scores.…”
Section: Main Analysismentioning
confidence: 99%
“…Continuous outcomes were modeled using negative binomial regression. Negative binomial regression is a generalization of the Poisson distribution and is suitable to model non-negative count data with overdispersion [42]. To improve interpretation, regression-adjusted coefficients were exponentiated and interpreted as percentages.…”
Section: Discussionmentioning
confidence: 99%