We have proposed a neural network (NN) model called a deep duel model (DDM) for rescoring N-best speech recognition hypothesis lists. A DDM is composed of a long short-term memory (LSTM)-based encoder followed by a fully-connected linear layer-based binary-class classifier. Given the feature vector sequences of two hypotheses in an N-best list, the DDM encodes the features and selects the hypothesis that has the lower word error rate (WER) based on the output binary-class probabilities. By repeating this one-on-one hypothesis comparison (duel) for each hypothesis pair in the N-best list, we can find the oracle (lowest WER) hypothesis as the survivor of the duels. We showed that the DDM can exploit the score provided by a forward LSTM-based recurrent NN language model (LSTMLM) as an additional feature to accurately select the hypotheses. In this study, we further improve the selection performance by introducing two modifications, i.e. adding the score provided by a backward LSTMLM, which uses succeeding words to predict the current word, and employing ensemble encoders, which have a high feature encoding capability. By combining these two modifications, our DDM achieves an over 10% relative WER reduction from a strong baseline obtained using both the forward and backward LSTMLMs.