Electricity load forecasting is an important task for enhancing energy efficiency and operation reliability of the power system. Forecasting the hourly electricity load of the next day assists in optimizing the resources and minimizing the energy wastage. The main motivation of this study was to improve the robustness of short-term load forecasting (STLF) by utilizing long shortterm memory (LSTM) and genetic algorithm (GA). The proposed method is novel: LSTM networks are designed to avoid the problem of long-term dependencies, and GA is used to obtain the optimal LSTM's parameters, which are then applied to predict the hourly electricity load for the next day. The proposed method was trained using actual load and weather data, and the performance results showed that it yielded small mean absolute percentage error on the test data.With the advance of computing power, deep neural network (DNN) has gained much popularity and been applied to STLF in recent years [36]. LSTM is a special type of DNN that is suitable for time series prediction due to its capability of remembering both the short-term and the long-term behavior in time series data. In Reference [37], two types of LSTM were compared against other deep learning techniques for predicting electricity load of every hour or every minute. Their results showed that LSTM outperformed the other techniques. However, all of the neural network approaches above require setting up the initial weightings of the links in the network, but poorly chosen weightings could lead the searching process trapped in the local optimum. A hybrid of LSTM and GA is proposed in Section 4 to resolve this problem.Prior to presenting our approach in Section 4, this section gives a detailed description of LSTM. LSTM is an augmented recurrent neural network model. It learns sequential information with long term dependencies, and preserves information for a long period of time. Traditional recurrent neural network suffers from the vanishing gradient problem. That is, as the number of layers using the same activation function increases, the gradients of the loss function approaches zero, making it difficult to train the network through backpropagation of errors. To prevent the vanishing gradient problem, LSTM utilizes memory cells, where each cell maintains a cell state and a hidden cell state, and uses three gates (namely, input gate, output gate, and forget gate) to control the flow of information into or out of the cell. A formal explanation of the LSTM model is given below.LSTM is for time series modeling, which maps an input sequence x = {x 1 , x 2 , . . . , x n } to an output sequence y = y 1 , y 2 , . . . , y n . For the STLF problem under study, each x i represents the hourly electricity load and the weather data of day i, and each y i represents the hourly electricity load of day i + 1, indicating a look-ahead parameter of value one. The LSTM contains layers of memory cells, where the interaction between the LSTM layers is shown in Figure 1, and the architecture of a LSTM cell is shown...