Most spatial audio reproduction systems have the constraint that all loudspeakers must be equidistant from the listener, a property which is difficult to achieve in real rooms. In traditional Ambisonics this arises because the spherical harmonic functions, which are used to encode the spatial sound-field, are orthonormal over a sphere and because loudspeaker proximity is not fully addressed. Recently, significant progress to lift this restriction has been made through the theory of sound field synthesis, which formalizes various spatial audio systems in a mathematical framework based on the single layer potential. This approach has shown many benefits but the theory, which treats audio rendering as a sound-soft scattering problem, can appear one step removed from the physical reality and also possesses frequencies where the solution is non-unique. In the time-domain Boundary Element Method approaches to address such non-uniqueness amount to statements which test the flow of acoustic energy rather than considering pressure alone. This paper applies that notion to spatial audio rendering by re-examining the Kirchhoff-Helmholtz integral equation as a wave-matching metric, and suggests a physical interpretation of its kernel in terms of common acoustic power flux density between waves. It is shown that the spherical basis functions (spherical harmonics multiplied by spherical Bessel or Hankel functions) are orthogonal over any arbitrary surface with respect to this metric. Finally other applications are discussed, including design of high-order microphone arrays and the coupling of virtual acoustic models to auralization hardware.