We describe a high-performance implementation of the lattice Boltzmann method (LBM) for sparse geometries on graphic processors. In our implementation we cover the whole geometry with a uniform mesh of small tiles and carry out calculations for each tile independently with proper data synchronization at the tile edges. For this method, we provide both a theoretical analysis of complexity and the results for real implementations involving two-dimensional (2D) and three-dimensional (3D) geometries. Based on the theoretical model, we show that tiles offer significantly smaller bandwidth overheads than solutions based on indirect addressing. For 2D lattice arrangements, a reduction in memory usage is also possible, although at the cost of diminished performance. We achieved a performance of 682 MLUPS on GTX Titan (72% of peak theoretical memory bandwidth) for the D3Q19 lattice arrangement and double-precision data.In the LBM, as in conventional CFD, the geometry, initial and boundary conditions must be specified to solve the initial value problem. The computational domain is uniformly partitioned, with computational nodes placed in vertices of adjacent cells, which become the lattice. One of the lattice structures used in this study, D2Q9, is presented in Fig. 1. The first number in the notation (D2) is the space dimension, while the second (Q9) is the lattice links number.Let f i represent the probability distribution function (PDF) along the i lattice direction: in which case, δ x and δ t are the lattice spacing and the lattice time step, respectively; This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.