Model-based development is a popular development approach in which software is implemented and verified based on a model of the required system. Finite state machines (FSMs) are widely used as models for systems in several domains. Validating that a model accurately represents the required behaviour involves the generation and execution of a large number of input sequences, which is often an expensive and time-consuming process. In this paper, we speed up the execution of input sequences for FSM validation, by leveraging the high degree of parallelism of modern graphics processing units (GPUs) for the automatic execution of FSM input sequences in parallel on the GPU threads. We expand our existing work by providing techniques that improve the performance and scalability of this approach. We conduct extensive empirical evaluation using 15 large FSMs from the networking domain and measure GPU speed-up over a 16-core CPU, taking into account total GPU time, which includes both data transfer and kernel execution time. We found that GPUs execute FSM input sequences up to 9.28Â faster than a 16-core CPU, with an average speed-up of 4.53Â across all subjects. Our optimizations achieve an average improvement over existing work of 58.95% for speed-up and scalability to large FSMs with over 2K states and 500K transitions. We also found that techniques aimed at reducing the number of required input sequences for large FSMs with high density were ineffective when applied to all-transition pair coverage, thus emphasizing the need for approaches like ours that speed up input execution.