Since stochastic computing performs operations using streams of bits that represent probability values instead of deterministic values, it can tolerate a large number of failures in a noisy system. However, the simulation of a stochastic implementation is extremely time-consuming. In this paper, we investigate two approaches to speed up the stochastic simulation: a GPU-based simulation and an OpenMP-based simulation. To compare these two approaches, we start with several basic stochastic computing elements (SCEs) and then use the stochastic implementation of a frame difference-based image segmentation algorithm as case study to conduct extensive experiments. Measured results show that the GPU-based simulation with 448 processing elements can achieve up to 119x performance speedup compared to the single-threaded CPU simulation and 17x performance speedup over the OpenMP-based simulation with eight processor cores. In addition, we present several performance optimisations for the GPU-based simulation which significantly benefit the performance of stochastic simulation. Minneapolis. He has worked as a development engineer at Tandem Computers Inc. and has chaired and served on the programme committees of numerous conferences. He was elected a Fellow of the IEEE and the AAAS. His main research interests include computer architecture, computer systems performance analysis, and high-performance storage systems, with a particular interest in the interaction of computer architecture with software, compilers, and circuits. This paper is a revised and expanded version of a paper entitled 'A GPU-based simulation for stochastic computing' presented at the