With the advent of virtual reality (VR) technology, spatial audio has been increasingly adopted to evaluate the acoustic environment in soundscape research. It is therefore imperative to assess the quality of commonly used spatial audio reproduction methods to determine their ecological validity. Through subjective evaluations with 30 participants, the same participant evaluated four outdoor in situ locations vis-à-vis its corresponding audio-visual recording in VR on a separate day. A total of three spatial audio reproduction methods were assessed in VR, and they were all down-mixed from the first-order ambisonics (FOA) recordings to headphonebased FOA-static binaural, FOA-tracked binaural; and FOA 2-dimensional (2D) octagonal speaker array. The participants evaluated the acoustic environment in terms of the overall soundscape quality and perceived spatial qualities at each location. Regarding overall soundscape quality, there were no significant differences in evaluating the sound-source dominance and affective soundscape qualities between in situ and all VR methods. However, significant differences were found in the perceived spatial qualities between three reproduction methods and in situ. Among the source-related spatial attributes, the perceived distance of the dominating sounds was farther in the virtual than in the in situ evaluations. In the localization of sound sources, both the FOA-tracked binaural and the FOA-2D speaker array exhibited higher spatial acoustic fidelity than FOA-static binaural. Regarding the environment-related spatial quality attributes, the 2D speaker array reproduction was perceived as more immersive and realistic than other reproduction methods. Overall, the FOA-tracked binaural appears to exhibit sufficient fidelity for cinematic VR evaluation of soundscapes.