Abstract. Eastern China is one of the most economically developed and
densely populated areas in the world. Due to its special geographical
location and climate, eastern China is affected by different weather systems,
such as monsoons, shear lines, typhoons, and extratropical cyclones. In the
near future, the rainfall rate becomes difficult to predict precisely due to
these systems. Traditional physics-based methods such as numerical weather
prediction (NWP) tend to perform poorly on nowcasting problems due to the
spin-up issue. Moreover, various meteorological stations are distributed in
this region, generating a large amount of observation data every day, which
have great potential for application to data-driven methods. Thus, it is
important to train a data-driven model from scratch that is suitable for the
specific weather situation of eastern China. However, due to the high degrees
of freedom and nonlinearity of machine learning algorithms, it is difficult
to add physical constraints. Therefore, with the intention of using various
kinds of data as a proxy for physical constraints, we collected three kinds
of data (radar, satellite, and precipitation data) in the flood season from
2017 to 2018 in this area and preprocessed them into tensors (256×256) that cover eastern China with a domain of 12.8×12.8∘.
The developed multisource data model (MSDM) combines the optical flow,
random forest, and convolutional neural network (CNN) algorithms. It treats
the precipitation nowcasting task as an image-to-image problem, which takes
radar and satellite data with an interval of 30 min as inputs and
predicts radar echo intensity with a lead time of 30 min. To reduce the
smoothing caused by convolutions, we use the optical flow algorithm to
predict satellite data in the following 120 min. The predicted radar
echoes from the MSDM together with satellite data from the optical flow
algorithm are recursively implemented in the MSDM to achieve a 120 min
lead time. The MSDM predictions are comparable to those of other baseline
models with a high temporal resolution of 6 min. To solve blurry image
problems, we applied a modified structural similarity (SSIM) index as a loss
function. Furthermore, we use the random forest algorithm with predicted
radar and satellite data to estimate the rainfall rate, and the results
outperform those of the traditional, nonlinear radar reflectivity factor and
rainfall rate (Z–R) relationships that use logarithmic functions. The
experiments confirm that machine learning with multisource data provides
more reasonable predictions and reveals a better nonlinear relationship
between radar echo and precipitation rate. Apart from developing complicated
machine learning algorithms, exploiting the potential of multisource data
will yield more improvements.