Fast simulation, i.e., automatic computation of sequential runs, is widely used to analyse Petri nets. In particular, it enables for quantitative statistical analysis by observing large sets of runs. Moreover, fast simulation may be used to actually run a Petri net model as a (prototype) implementation of a system, in which case such a net would embed fragments of the code of the system. In both these contexts, being able to perform faster simulation is highly desirable. In this paper, we propose a way to accelerate fast simulation by exploiting parallel computing, targeting both the multi-core cpus available nowadays in every laptop or workstation, and larger parallel computers including those with distributed memory (clusters). We design an algorithm to do so and assess in particular its correctness and completeness through its formal modelling as a Petri net whose state space is analysed. We also present a benchmark of a prototype implementation that clearly shows how our algorithm effectively accelerates fast simulation, in particular in the case of large concurrent coloured Petri nets, which is precisely the kind of nets that are usually slow to simulate.