This paper describes an empirical study aiming at identifying the main differences between different logistic regression models and collision data aggregation methods that are commonly applied in road safety literature for modeling collision severity. In particular, the research compares three popular multilevel logistic models (i.e., sequential binary logit models, ordered logit models, and multinomial logit models) as well as three data aggregation methods (i.e., occupant based, vehicle based, and collision based). Six years of collision data (2001)(2002)(2003)(2004)(2005)(2006) from 31 highway routes from across the province of Ontario, Canada were used for this analysis. It was found that a multilevel multinomial logit model has the best fit to the data than the other two models while the results obtained from occupant-based data are more reliable than those from vehicle-and collision-based data. More importantly, while generally consistent in terms of factors that were found to be significant between different models and data aggregation methods, the effect size of each factor differ substantially, which could have significant implications for evaluating the effects of different safety-related policies and countermeasures.
Most accident prediction models are developed with single-level count data models, such as the traditional negative binomial models with fixed or varying dispersion parameters, assuming independence of data. For many accident data sets in road safety analysis, especially those that are highly disaggregated (hourly data), a hierarchical structure in the data often manifests in some form of correlation. Crash prediction models developed with aggregate data could produce biased results because of the assumption of data independence and inflation of the adequacy of the model's explanation because of the use of aggregate data. The potential effects of data aggregation and correlation on accident prediction models are investigated. The analysis uses an accident database that includes hour-level and storm-level accident counts for individual winter snowstorms at four highway sections in Ontario, Canada. Models of two levels of aggregation, aggregated event-based models and disaggregated hourly based models, were developed. The effect of data aggregation had a significant effect on model results, whereas the difference between conventional regression and multilevel regression was inconsequential.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.