In the application of deep learning to realize intelligent train operation, there are some problems, such as the single learning task. Especially when using the gradient descent approach to optimize the structure, weight and threshold of a deep network, it is easy in this task to fall into a local optimum. This leads to excessive reliance on manual tuning experience. Aiming at the above issues, this paper proposes a new approach of train manipulation and prediction based on a long short-term memory (LSTM) deep network. From the perspective of automatic hyper-parameter optimization, the gradient-free intelligent search method is principally chosen to optimize the architecture and parameters of a LSTM deep network, so as to improve the manipulation accuracy based on learning from excellent drivers. This method first selects excellent driver data through the Pareto dominance principle and crowding distance calculation; on this basis, a step-by-step method is used to optimize the structure, weight and threshold of the LSTM network. Particularly, in the first step, we adopt a genetic algorithm to search for the optimal deep network structure, which overcomes the problem that the structure is difficult to determine. In the second step, we optimize the parameters of the deep network, a process that is divided into two stages of 'rough learning' and 'precise learning'. In the 'rough learning' stage, we use the multi-population chained multi-agent (MPCMA) algorithm to preliminarily optimize the LSTM network parameters. In the 'precise learning' stage, the Adam algorithm is applied to further finely optimize the network parameters. Finally, through simulation experiments, it is verified that the proposed method improves the accuracy of train manipulation and prediction, and shows strong robustness in situations of multiple manipulation sequences and different temporary speed limits.