Many applications from the fields of seismology and geoengineering require simulations of seismic waves in porous media. Biot's theory of poroelasticity describes the coupling between solid and fluid phases and introduces a stiff reactive source term (Darcy's Law) into the elastodynamic wave equations, thereby increasing computational cost of respective numerical solvers and motivating efficient methods utilising High-Performance Computing.We present a novel realisation of the discontinuous Galerkin scheme with Arbitrary High-Order DERivative time stepping (ADER-DG) that copes with stiff source terms. To integrate this source term with a reasonable time step size, we utilise an element-local space-time predictor, which needs to solve medium-sized linear systems -each with 1,000 to 10,000 unknowns -in each element update (i.e., billions of times). We present a novel block-wise back-substitution algorithm for solving these systems efficiently, thus enabling large-scale 3D simulations. In comparison to LU decomposition, we reduce the number of floating-point operations by a factor of up to 25, when using polynomials of degree 6. The block-wise back-substitution is mapped to a sequence of small matrix-matrix multiplications, for which code generators are available to generate highly optimised code.We verify the new solver thoroughly against analytical and semi-analytical reference solutions in problems of increasing complexity. We demonstrate high-order convergence of the scheme for 3D problems. We verify the correct treatment of point sources and boundary conditions, including homogeneous and heterogeneous full space problems as well as problems with traction-free boundary conditions. In addition, we compare against a finite difference solution for a newly defined 3D layer over half-space problem containing an internal material interface and free surface. We find that extremely high accuracy is required to accurately resolve the slow, diffusive P-wave at a or near a free surface, while we also demonstrate that solid particle velocities are not affected by coarser resolutions. We demonstrate that by using a clustered local time stepping scheme, time to solution is reduced by a factor of 6 to 10 compared to global time stepping. We conclude our study with a scaling and performance analysis on the SuperMUC-NG supercomputer, demonstrating our implementation's high computational efficiency and its potential for extreme-scale simulations.