3D audio spatializers for Virtual Reality (VR) can use the acoustic properties of the surfaces of a visualised game space to calculate a matching reverb. However, this approach could lead to reverbs that impair the tasks performed in such a space, such as listening to speech-based audio. Sound designers would then have to alter the room’s acoustic properties independently of its visualisation to improve speech intelligibility, causing audio-visual incongruency. As user expectation of simulated room acoustics regarding speech intelligibility in VR has not been studied, this study asked participants to rate the congruency of reverbs and their visualisations in 6-DoF VR while listening to speech-based audio. The participants compared unaltered, matching reverbs with sound-designed, mismatching reverbs. The latter feature improved D50s and reduced RT60s at the cost of lower audio-visual congruency. Results suggest participants preferred improved reverbs only when the unaltered reverbs had comparatively low D50s or excessive ringing. Otherwise, too dry or too reverberant reverbs were disliked. The range of expected RT60s depended on the surface visualisation. Differences in timbre between the reverbs may not affect preferences as strongly as shorter RT60s. Therefore, sound designers can intervene and prioritise speech intelligibility over audio-visual congruency in acoustically challenging game spaces.