Previously, a nonlinear autoregressive network with exogenous input (NARX) demonstrated an excellent performance, far outperforming an established method in optimal baseline subtraction, for defect detection in guided wave signals. The principle is to train a NARX network on defect-free guided wave signals to obtain a filter that predicts the next point from the previous points in the signal. The trained network is then applied to new measurement and the output subtracted from the measurement to reveal the presence of defect responses. However, as shown in this paper, the performance of the previous NARX implementation lacks robustness; it is highly dependent on the initialisation of the network and detection performance sometimes improves and then worsens over the course of training. It is shown that this is due to the previous NARX implementation only making predictions one point ahead. Subsequently, it is shown that multi-step prediction using a newly proposed NARX structure creates a more robust training procedure, by enhancing the correlation between the training loss metric and the defect detection performance. The physical significance of the network structure is explored, allowing a simple hyperparameter tuning strategy to be used for determining the optimal structure. The overall detection performance of NARX is also improved by multi-step prediction, and this is demonstrated on defect responses at different times as well as on data from different sensor pairs, revealing the generalisability of this method.