This paper documents the setup and validation of nonlinear autoregressive network with exogenous inputs (NARX) models of a heavy-duty single-shaft gas turbine (GT). The data used for model training are time series datasets of several different maneuvers taken experimentally on a GT General Electric PG 9351FA during the start-up procedure and refer to cold, warm, and hot start-up. The trained NARX models are used to predict other experimental datasets, and comparisons are made among the outputs of the models and the corresponding measured data. Therefore, this paper addresses the challenge of setting up robust and reliable NARX models, by means of a sound selection of training datasets and a sensitivity analysis on the number of neurons. Moreover, a new performance function for the training process is defined to weigh more the most rapid transients. The final aim of this paper is the setup of a powerful, easy-to-build and very accurate simulation tool, which can be used for both control logic tuning and GT diagnostics, characterized by good generalization capability.