Monte Carlo (MC) sampling is the standard approach for uncertainty propagation in problems with high-dimensional stochastic inputs. Various acceleration techniques have been developed to overcome the slow convergence of MC estimates, such as multilevel Monte Carlo (MLMC). MLMC uses successive approximations computed on levels, models with different levels of accuracy, and computational cost to reduce the estimator variance. MLMC analytically determines the number of samples required on each level to achieve a given accuracy at minimal cost. We propose an extension of the original MLMC theoretical framework for modern, heterogeneous computer architectures in which accelerators (GPUs) are available and, therefore, samples can be distributed on both different levels and different compute units (CPUs and GPUs). We derive the optimal sample allocation for the proposed MLMC extension by solving a convex optimization problem. We apply the MLMC extension to a stochastically heated channel flow to provide insight for a study on the design of concentrated solar energy receivers. We demonstrate for the stochastically heated channel flow that the proposed MLMC extension leads to considerable total cost reduction (up to 86%) compared to MLMC using only GPUs.