The accurate prediction of bus passenger flow is the key to public transport management and the smart city. A long short-term memory network, a deep learning method for modeling sequences, is an efficient way to capture the time dependency of passenger flow. In recent years, an increasing number of researchers have sought to apply the LSTM model to passenger flow prediction. However, few of them pay attention to the optimization procedure during model training. In this article, we propose a hybrid, optimized LSTM network based on Nesterov accelerated adaptive moment estimation (Nadam) and the stochastic gradient descent algorithm (SGD). This method trains the model with high efficiency and accuracy, solving the problems of inefficient training and misconvergence that exist in complex models. We employ a hybrid optimized LSTM network to predict the actual passenger flow in Qingdao, China and compare the prediction results with those obtained by non-hybrid LSTM models and conventional methods. In particular, the proposed model brings about a 4%–20% extra performance improvements compared with those of non-hybrid LSTM models. We have also tried combinations of other optimization algorithms and applications in different models, finding that optimizing LSTM by switching Nadam to SGD is the best choice. The sensitivity of the model to its parameters is also explored, which provides guidance for applying this model to bus passenger flow data modelling. The good performance of the proposed model in different temporal and spatial scales shows that it is more robust and effective, which can provide insightful support and guidance for dynamic bus scheduling and regional coordination scheduling.