Abstract. Sparse grids have become an important tool to reduce the number of degrees of freedom of discretizations of moderately high-dimensional partial differential equations; however, the reduction in degrees of freedom comes at the cost of an almost dense and unconventionally structured system of linear equations. To guarantee overall efficiency of the sparse grid approach, special linear solvers are required. We present a multigrid method that exploits the sparse grid structure to achieve an optimal runtime that scales linearly with the number of sparse grid points. Our approach is based on a novel decomposition of the right-hand sides of the coarse grid equations that leads to a reformulation in so-called auxiliary coefficients. With these auxiliary coefficients, the right-hand sides can be represented in a nodal point basis on low-dimensional full grids. Our proposed multigrid method directly operates in this auxiliary coefficient representation, circumventing most of the computationally cumbersome sparse grid structure. Numerical results on nonadaptive and spatially adaptive sparse grids confirm that the runtime of our method scales linearly with the number of sparse grid points and they indicate that the obtained convergence factors are bounded independently of the mesh width. 1. Introduction. Many problems in computational science and engineering, e.g., in financial engineering, quantum mechanics, and data-driven prediction, lead to high-dimensional function approximations [23]. We consider here the case where the function is not given explicitly at selected sample points but implicitly as the solution of a high-dimensional partial differential equation (PDE). Standard discretizations on isotropic uniform grids, so-called full grids, become computationally infeasible because the number of degrees of freedom grows exponentially with the dimension. Sparse grid discretizations are a versatile way to circumvent this exponential growth [4]. If the function satisfies certain smoothness assumptions [4, section 3.1] sparse grids reduce the number of degrees of freedom to depend only logarithmically on the dimension; however, the discretization of a PDE on a sparse grid with the (piecewise linear) hierarchical basis results in an almost dense and unconventionally structured system of linear equations. To be computationally efficient solvers must take this structure into account. We develop a multigrid method that exploits the structure of sparse grid discretizations to achieve an optimal runtime that scales linearly with the number of sparse grids points.Multigrid methods have been extensively studied in the context of sparse grids. They are used either as preconditioners [1,11,12,15,16] or within real multigrid schemes [2,3,6,26,33]. We first note that certain problems allow for sparse grid dis-