In this work, we focus on a systematic adaptation of the stencil-based multidimensional positive definite advection transport algorithm (MPDATA) to different graphics processing unit (GPU)-based computing platforms. Another objective of this work is to compare the performance of MPDATA on several platforms, including a multi-GPU system with two NVIDIA Tesla K80 cards, and single-card platforms with Tesla K20X, GeForce GTX TITAN, and GeForce GTX 980. The usage of the following optimization methods is proposed to improve the overall performance: (i) reducing the number of operations by the subexpression elimination when implementing 2.5D blocking; (ii) reorganization of boundary conditions for reducing branch instructions; (iii) advanced memory management to increase the coalesced memory access; and (iv) warps rearrangement for optimizing the data access to GPU global memory. The presented methods of the MPDATA adaptation to GPU architectures allow us to efficiently use many graphics processors within a single node by applying peer-to-peer data transfers between GPU global memories. We propose an auto-tuning procedure to compensate architectural differences between the considered platforms. This procedure takes into account algorithm/GPU-specific parameters. The proposed approach to adaptation of MPDATA to GPU architectures allows us to achieve up to 482.5 Gflop/s for the platform equipped with two NVIDIA K80 GPUs. for simulating thermo-fluid flows across a wide range of scales and physical scenarios, such as numerical weather and climate prediction, simulation of urban flows, areas of turbulence, ocean currents, and others. Recently, the dynamical core of EULAG has been implemented into consortium for small-scale modeling weather prediction framework and is expected to be in operational use [5]. The dynamical core of EULAG is based on the non-hydrostatic Euler equations, either fully compressible or anelastic. The model employs the generalized curvilinear coordinate description, finite-volume non-oscillatory transport MPDATA, and advanced elliptic solver generalized conjugate residual (GCR) [6].To be able to run the existing codes efficiently on new hybrid platforms with accelerators, it is necessary to redesign structures of these codes [7]. In our previous work [8], we proposed two decompositions of 2D MPDATA computations, which provide adaptation to CPU and GPU architectures. We developed a hybrid CPU-GPU version of 2D MPDATA in order to fully utilize all the available computing resources. The next step in our research was to parallelize the 3D version of MPDATA. It required to develop a different approach than for the 2D version. In papers [7,9], we presented an analysis of resources usage in GPU, and its influence on the resulting performance. We detected the bottlenecks and developed a method for the efficient distribution of computation across GPU kernels.Following our previous papers, in this work, we propose a set of methods for adaptating the 3D MPDATA to different GPU accelerators. We investigate differ...