Purpose
There is an increasing use of geocoded birth registry data in environmental epidemiology research. Ungeocoded records are routinely excluded.
Methods
We used classification and regression tree analysis (CART) and logistic regression to investigate potential selection bias associated with this exclusion among all singleton Florida births in 2009 (N=210,285).
Results
The rate of unsuccessful geocoding was 11.5% (n=24,171). This ranged between 0% to 100% across zip codes. Living in a rural zip code was the strongest predictor of being ungeocoded. Other predictors for geocoding status varied with urbanity status. In urban areas, maternal race [adjusted odds ratio (aOR) ranging between 1.08 for Hispanic to 1.18 for Black compared to White], maternal age [aOR: 1.16 (1.10-1.23) for ages 20-34 compared to <20], maternal nativity [aOR: 1.20 (1.15, 1.25) for Non-US vs. US born], delivery at a birth center [aOR: 1.72(1.49, 2.00 compared to hospital delivery)], multiparity [aOR: 0.91 (0.88, 0.94)], maternal smoking [aOR: 0.82 (0.76-0.88)] and having non-private insurance [aOR: 1.25 (1.20-1.30) for Medicaid vs. private insurance] were significantly associated with being ungeocoded. In rural areas, births delivered at birth center [aOR: 2.91(1.80-4.73)] or home [aOR: 1.94(1.28-2.95) had increased odds compared to hospital births. The characteristics predictive of being ungeocoded were also significantly associated with adverse birth outcomes such as low birthweight and preterm delivery, and the association for maternal age was different when ungeocoded births were included and excluded.
Conclusions
Geocoding status is not random. Women with certain exposure-outcome characteristics may be more likely to be ungeocoded and excluded, indicating potential selection bias.