We consider a simple long short-term memory (LSTM) neural network extension of the Poisson Lee-Carter model, with a particular focus on different procedures for how to use training data efficiently, combined with ensembling to stabilise the predictive performance. We compare the standard approach of withholding the last fraction of observations for validation, with two other approaches: sampling a fraction of observations randomly in time; and splitting the population into two parts by sampling individual life histories. We provide empirical and theoretical support for using these alternative approaches. Furthermore, to improve the stability of long-term predictions, we consider boosted versions of the Poisson Lee-Carter LSTM. In the numerical illustrations it is seen that even in situations where mortality rates are essentially log-linear as a function of calendar time, the boosted model does not perform significantly worse than a simple random walk with drift, and when non-linearities are present the predictive performance is improved. Moreover, boosting allows us to obtain reasonable model calibrations based on as few data points as 20 years.