Flash floods occur frequently and distribute widely in mountainous areas because of complex geographic and geomorphic conditions and various climate types. Effective flash flood forecasting with useful lead times remains a challenge due to its high burstiness and short response time. Recently, machine learning has led to substantial changes across many areas of study. In hydrology, the advent of novel machine learning methods has started to encourage novel applications or substantially improve old ones. This study aims to establish a discharge forecasting model based on Long Short-Term Memory (LSTM) networks for flash flood forecasting in mountainous catchments. The proposed LSTM flood forecasting (LSTM-FF) model is composed of T multivariate single-step LSTM networks and takes spatial and temporal dynamics information of observed and forecast rainfall and early discharge as inputs. The case study in Anhe revealed that the proposed models can effectively predict flash floods, especially the qualified rates (the ratio of the number of qualified events to the total number of flood events) of large flood events are above 94.7% at 1-5 h lead time and range from 84.2% to 89.5% at 6-10 h lead-time. For the large flood simulation, the small flood events can help the LSTM-FF model to explore a better rainfall-runoff relationship. The impact analysis of weights in the LSTM network structures shows that the discharge input plays a more obvious role in the 1-h LSTM network and the effect decreases with the lead-time. Meanwhile, in the adjacent lead-time, the LSTM networks explored a similar relationship between input and output. The study provides a new approach for flash flood forecasting and the highly accurate forecast contributes to prepare for and mitigate disasters.Water 2020, 12, 109 2 of 15 concluded no model could make reliable flash flood forecasts in spite of the plausible results of physically-based distributed hydrological models. Many studies showed that distributed hydrological models have advantages over lumped hydrological models and data-driven models, but they are computationally inefficient and need high-resolution sophisticated input data (e.g., DEM, land-use and soil maps, and soil characteristics) [5]. Hence, their applicability is limited in mountainous catchments. The expected benefits of using high-resolution distributed models might be masked by the increasing uncertainties at small scales [6]. Lumped hydrological models for flash flood forecasting are limited by their coarse resolution and inadequate description of rainfall spatial distribution, which has a great impact on the catchment response [2]. In addition, physically-based hydrological models depend heavily on their boundary conditions, which are often poorly defined [7]. It is difficult to describe flash flood generation and propagation by a deterministic approach due to the complexity of its processes. Numerous studies indicate the gap of physically-based hydrological models in short-term flood prediction [8].With the advancements in sys...