SUMMARYThe Highway Safety Manual (HSM) recommends using the empirical Bayes method with locally derived calibration factors to predict an agency's safety performance. The data needs for deriving these local calibration factors are significant, requiring very detailed roadway characteristics information. Many of these data variables are currently unavailable in most of the agencies' databases. Furthermore, it is not economically feasible to collect and maintain all the HSM data variables. This study aims to prioritize the HSM calibration variables based on their impact on crash predictions. Prioritization would help to identify influential variables for which data could be collected and maintained for continued updates, and thereby reduce intensive data collection efforts. Data were first collected for all the HSM variables from over 2400 miles of urban and suburban arterial road networks in Florida. Using 5 years (2008)(2009)(2010)(2011)(2012) of crash data, a random forests data mining approach was then applied to measure the importance of each variable in crash frequency predictions for five different urban and suburban arterial facilities including two-lane undivided, three-lane with a two-way left-turn lane, four-lane undivided, four-lane divided, and five-lane with a twoway left-turn lane. Two heuristic approaches were adopted to prioritize the variables: (i) simple ranking based on individual relative influence of variables; and (ii) clustering based on relative influence of variables within a specific range. Traffic volume was found as the most influential variable. Roadside object density, minor commercial driveway density, and minor residential driveway density variables were the other variables with significant influence on crash predictions.