This paper presents a multilevel convergence framework for multigrid-reduction-intime (MGRIT) as a generalization of previous two-grid estimates. The framework provides a priori upper bounds on the convergence of MGRIT V-and F-cycles, with different relaxation schemes, by deriving the respective residual and error propagation operators. The residual and error operators are functions of the time stepping operator, analyzed directly and bounded in norm, both numerically and analytically. We present various upper bounds of different computational cost and varying sharpness. These upper bounds are complemented by proposing analytic formulae for the approximate convergence factor of V-cycle algorithms that take the number of fine grid time points, the temporal coarsening factors, and the eigenvalues of the time stepping operator as parameters.The paper concludes with supporting numerical investigations of parabolic (anisotropic diffusion) and hyperbolic (wave equation) model problems. We assess the sharpness of the bounds and the quality of the approximate convergence factors. Observations from these numerical investigations demonstrate the value of the proposed multilevel convergence framework for estimating MGRIT convergence a priori and for the design of a convergent algorithm. We further highlight that observations in the literature are captured by the theory, including that two-level Parareal and multilevel MGRIT with F-relaxation do not yield scalable algorithms and the benefit of a stronger relaxation scheme. An important observation is that with increasing numbers of levels MGRIT convergence deteriorates for the hyperbolic model problem, while constant convergence factors can be achieved for the diffusion equation. The theory also indicates that L-stable Runge-Kutta schemes are more amendable to multilevel parallel-in-time integration with MGRIT than A-stable Runge-Kutta schemes.Key words. multilevel convergence theory, multigrid-reduction-in-time (MGRIT), parallel-intime, multigrid, analytic upper bounds, a priori estimates 1. Introduction. Modern computer architectures enable massively parallel computations for systems under numerical investigation. While clock rates of recent highperformance computing architectures have largely become stagnant, increased concurrency continues to reduce the time-to-solution, allowing for increased complexity of computational models and accuracy of computed quantities.Spatial domain decomposition (DD) methods are a wide-spread class of parallelization techniques to exploit parallelism in numerical simulations. Many DD methods are straightforward to implement and scalable in parallel up to the point that communication tasks become dominant over computation tasks. Thus, spatial parallelism may saturate without exploiting the full potential of the available hardware.Parallel-in-time methods [37,15] increase the amount of parallelism that can be exploited by introducing parallelism in the temporal domain. Many such methods exist, including waveform relaxation [35,47], space-time ...