Abstract. Despite several flood databases available in the United States,
there is a benefit to combine and reconcile these diverse data sources into
a comprehensive flood database with a unified common format and easy public
access in order to facilitate flood-related research and applications.
Typically, floods are reported by specialists or media according to their
socioeconomic impacts. Recently, data-driven analysis can reconstruct flood
events based on in situ
and/or remote-sensing data. Lately, with the
increasing engagement of citizen scientists, there is the potential to
enhance flood reporting in near-real time. The central objective of this
study is to integrate information from seven popular multi-sourced flood
databases into a comprehensive flood database in the United States, made readily
available to the public in a common data format. Natural language
processing, geocoding, and harmonizing processing steps are undertaken to
facilitate such development. In total, there are 698 507 flood records in
the United States from 1900 to the present, which highlights the longest and most
comprehensive recording of flooding across the country. The database
features event locations, durations, date/times, socioeconomic impacts
(e.g., fatalities and economic damages), and geographic information (e.g.,
elevation, slope, contributing area, and land cover types retrieved from
ancillary data for given flood locations). Finally, this study utilizes the
flood database to analyze flood seasonality within major basins and
socioeconomic impacts over time. It is anticipated that thus far the most
comprehensive yet unified database can support a variety of flood-related
research, such as a validation resource for hydrologic or hydraulic
simulations, hydroclimatic studies concerning spatiotemporal patterns of
floods, and flood susceptibility analysis for vulnerable geophysical
locations. The dataset is publicly available with the following DOI:
https://doi.org/10.5281/zenodo.4547036 (Li, 2020).