In this paper we demonstrate the process of efficient porting a software package for Markov chain Monte Carlo (MCMC) simulations on a finite cubic lattice on multiple modern architectures: Pascal, Volta and Turing NVIDIA GPUs, NEC SX-Aurora TSUBASA vector engines and Intel Xeon Gold processors. In the studied software, MCMC methodology is used for simulations of liquid crystal structures, but it can be as well employed in a wide range of problems of mathematical physics and numerical methods. The main goals of this work are to determine the best software optimization strategy for this class of algorithms and to examine the speed and the efficiency of such simulations on modern HPC platforms. We evaluate the effects of various optimizations, such as using more suitable memory access patterns, multitasking for efficient utilization of massive parallelism on the target architectures, improved cache hit-rates, parallel workload balancing, etc. We perform a detailed performance analysis for each target platform using software tools such as nvprof, Ftrace and VTune. On this basis, we evaluate and compare the efficiency of the developed computational kernels on different platforms and subsequently rank these platforms by their performance. The results show that NVIDIA GPU and NEC SX-Aurora TSUBASA platforms, although at first glance seem very different, require similar optimization approaches in many cases due to similarities in data processing principles.