Gravitational wave (GW) standard sirens are well-established probes with which one can measure cosmological parameters, and are complementary to other probes like the cosmic microwave background (CMB) or supernovae standard candles. Here we focus on dark GW sirens, specifically binary black holes (BBHs) for which there is only GW data. Our approach relies on the assumption of a source-frame mass model for the BBH distribution, and we consider four models that are representative of the BBH population observed so far. In addition to inferring cosmological and mass model parameters, we use dark sirens to test modified gravity theories. These theories often predict different GW propagation equations on cosmological scales, leading to a different GW luminosity distance which in some cases can be parametrized by variables Ξ 0 and n. General relativity (GR) corresponds to Ξ 0 = 1. We perform a joint estimate of the population parameters governing mass, redshift, the variables characterizing the cosmology, and the modified GW luminosity distance. We use data from the third LIGO-Virgo-KAGRA observation run (O3) and find − for the four mass models and for three signal-to-noise ratio (SNR) cuts of 10, 11, 12 − that GR is consistently the preferred model to describe all observed BBH GW signals to date. Furthermore, all modified gravity parameters have posteriors that are compatible with the values predicted by GR at the 90% confidence interval (CI). We then focus on future observation runs O4 and O5, and for simplicity consider one specific mass model. We show that there are strong correlations between cosmological, astrophysical and modified gravity parameters. If GR is the correct theory of gravity, and assuming narrow priors on the cosmological parameters, we recover the modified gravity parameter Ξ 0 = 1.47 +0.92 −0.57 with O4, and Ξ 0 = 1.08 +0.27 −0.16 with O4 and O5. We also consider how wide priors on the cosmological parameters impact our results. If, however, Nature follows a modified gravity model with Ξ 0 = 1.8 and n = 1.91, we exclude GR at the 1.7 σ level with O4 and at the 2.3 σ level with O4 and O5 combined.