Background
After claiming nearly five hundred thousand lives globally, the COVID-19 pandemic is showing no signs of slowing down. While the UK, USA, Brazil and parts of Asia are bracing themselves for the second wave—or the extension of the first wave—it is imperative to identify the primary social, economic, environmental, demographic, ethnic, cultural and health factors contributing towards COVID-19 infection and mortality numbers to facilitate mitigation and control measures.
Methods
We process several open-access datasets on US states to create an integrated dataset of potential factors leading to the pandemic spread. We then apply several supervised machine learning approaches to reach a consensus as well as rank the key factors. We carry out regression analysis to pinpoint the key pre-lockdown factors that affect post-lockdown infection and mortality, informing future lockdown-related policy making.
Findings
Population density, testing numbers and airport traffic emerge as the most discriminatory factors, followed by higher age groups (above 40 and specifically 60+). Post-lockdown infected and death rates are highly influenced by their pre-lockdown counterparts, followed by population density and airport traffic. While healthcare index seems uncorrelated with mortality rate, principal component analysis on the key features show two groups: states (1) forming early epicenters and (2) experiencing strong second wave or peaking late in rate of infection and death. Finally, a small case study on New York City shows that days-to-peak for infection of neighboring boroughs correlate better with inter-zone mobility than the inter-zone distance.
Interpretation
States forming the early hotspots are regions with high airport or road traffic resulting in human interaction. US states with high population density and testing tend to exhibit consistently high infected and death numbers. Mortality rate seems to be driven by individual physiology, preexisting condition, age etc., rather than gender, healthcare facility or ethnic predisposition. Finally, policymaking on the timing of lockdowns should primarily consider the pre-lockdown infected numbers along with population density and airport traffic.