We present the first systematic comparison between gravitational waveforms emitted by inspiralling, quasi-circular and nonspinning black hole binaries computed with three different approaches: second-order gravitational self-force (2GSF) theory, as implemented in the 1PAT1 model; numerical relativity (NR), as implemented by the SXS collaboration; and the effective one body (EOB) formalism, as implemented in the TEOBResumS waveform model. To compare the models we use both a standard, time-domain waveform alignment and a gauge-invariant analysis based on the dimensionless function Qω(ω) ≡ ω 2 / ω, where ω is the gravitational wave frequency. We analyse the domain of validity of the 1PAT1 model, deriving error estimates and showing that the effects of the final transition to plunge, which the model neglects, extend over a significantly larger frequency interval than one might expect. Restricting to the inspiral regime, we find that, while for mass ratios q = m1/m2 ≤ 10 TEOBResumS is largely indistinguishable from NR, 1PAT1 has a significant dephasing ≳ 1 rad; conversely, for q ≳ 100, 1PAT1 is estimated to have phase errors < 0.1 rad on a large frequency interval, while TEOBResumS develops phase differences ≳ 1 rad with it. Most crucially, on that same large frequency interval we find good agreement between TEOBResumS and 1PAT1 in the intermediate regime 15 ≲ q ≲ 64, with < 0.5 rad dephasing between them. A simple modification to the TEOBResumS flux further improves this agreement for q ≳ 30, reducing the dephasing to ≈ 0.27 rad even at q = 128. While our analysis points to the need for more highly accurate, long-inspiral, NR simulations for q ≳ 15 to precisely quantify the accuracy of EOB/2GSF waveforms, we can clearly identify the primary sources of error and routes to improvement of each model. In particular, our results pave the way for the construction of GSF-informed EOB models for both intermediate and extreme mass ratio inspirals for the next generation of gravitational wave detectors.