A fast micromagnetic simulator (FastMag) for general problems is presented. FastMag solves the Landau-Lifshitz-Gilbert equation and can handle problems of a small or very large size with a high speed. The simulator derives its high performance from efficient methods for evaluating the effective field and from implementations on massively parallel Graphics Processing Unit (GPU) architectures. FastMag discretizes the computational domain into tetrahedral elements and therefore is highly flexible for general problems. The magnetostatic field is computed via the superposition principle for both volume and surface parts of the computational domain. This is accomplished by implementing efficient quadrature rules and analytical integration for overlapping elements in which the integral kernel is singular. Thus discretized superposition integrals are computed using a non-uniform grid interpolation method, which evaluates the field from N sources at N collocated observers in () ON operations. This approach allows handling any uniform or non-uniform shapes, allows easily calculating the field outside the magnetized domains, does not require solving linear system of equations, and requires little memory. FastMag is implemented on GPUs with GPU-CPU speed-ups of two orders of magnitude. Simulations are shown of a large array and a recording head fully discretized down to the exchange length, with over a hundred million tetrahedral elements on an inexpensive desktop computer.