Abstract. We show feasibility and benefits of porting an adaptive multi-scale kinetic-fluid code to CPU-GPU systems. Challenges are due to the irregular data access for adaptive Cartesian mesh, vast difference of computational cost between kinetic and fluid cells, and desire to evenly load all CPUs and GPUs. Our Unified Flow Solver (UFS) combines Adaptive Mesh Refinement (AMR) with automatic cell-by-cell selection of kinetic or fluid solvers based on continuum breakdown criteria. Using GPUs enables hybrid simulations of mixed rarefied-continuum flows with a million of Boltzmann cells with 24x24x24 velocity mesh. We describe the implementation of CUDA kernels for three modules in UFS: the direct Boltzmann solver using discrete velocity method, the Direct Simulation Monte Carlo (DSMC) solver, and a mesoscopic solver based on Lattice Boltzmann Method, all using octree Cartesian mesh. Double digit speedups on single GPU and good scaling for multiGPUs have been demonstrated.