OBJECTIVES. The purpose of this study was to identify circumstances in which biochemical assessments of smoking produce systematically higher or lower estimates of smoking than self-reports. A secondary aim was to evaluate different statistical approaches to analyzing variation in validity estimates. METHODS. Literature searches and personal inquiries identified 26 published reports containing 51 comparisons between self-reported behavior and biochemical measures. The sensitivity and specificity of self-reports of smoking were calculated for each study as measures of accuracy. RESULTS. Sensitivity ranged from 6% to 100% (mean = 87.5%), and specificity ranged from 33% to 100% (mean = 89.2%). Interviewer-administered questionnaires, observational studies, reports by adults, and biochemical validation with cotinine plasma were associated with higher estimates of sensitivity and specificity. CONCLUSIONS. Self-reports of smoking are accurate in most studies. To improve accuracy, biochemical assessment, preferably with cotinine plasma, should be considered in intervention studies and student populations.
It is widely but incorrectly believed that the t-test and linear regression are valid only for Normally distributed outcomes. The t-test and linear regression compare the mean of an outcome variable for different subjects. While these are valid even in very small samples if the outcome variable is Normally distributed, their major usefulness comes from the fact that in large samples they are valid for any distribution. We demonstrate this validity by simulation in extremely non-Normal data. We discuss situations in which in other methods such as the Wilcoxon rank sum test and ordinal logistic regression (proportional odds model) have been recommended, and conclude that the t-test and linear regression often provide a convenient and practical alternative. The major limitation on the t-test and linear regression for inference about associations is not a distributional one, but whether detecting and estimating a difference in the mean of the outcome answers the scientific question at hand.
Important questions about health care are often addressed by studying health care utilization. Utilization data have several characteristics that make them a challenge to analyze. In this paper we discuss sources of information, the statistical properties of utilization data, common analytic methods including the two-part model, and some newly available statistical methods including the generalized linear model. We also address issues of study design and new methods for dealing with censored data. Examples are presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.