A situated learning environment is crucial for language learners to develop speaking skills as learners can apply their speaking skills in context, helping them adapt language use to various situations and improve their language proficiency and communication effectiveness. Although various features of situated learning environments have been explored across different subject areas and in many platforms, there is limited research on their application to language learning within VR environments. This convergent mixed-method study adopts a situated learning framework and examines the impact of situated learning on learners’ English-speaking performance, specifically in areas of fluency, vocabulary, pronunciation, and grammar, and explores learners’ perception of the instruction based on the situated learning approach. Sixteen first-year English majors at a university in China participated in eight role-play speaking classes using the desktop-based VR application, Immerse. The study involved pre- and post-assessments of speaking performance and semi-structured interviews with six participants. Paired samples t-tests were used to assess the difference in the speaking performance and respective areas, and a thematic analysis was adopted to explore learners’ perceptions of the instruction based on the situated learning approach. Quantitative findings show a significant improvement in learners’ speaking performance (t(15) = 7.41, p < .001, Cohen’s d = 1.82), with notable progress in fluency, vocabulary, pronunciation, and grammar. Thematic analysis of the qualitative data indicated the authenticity of the context and activities, the collaborative nature of the tasks, the expert guidance, and the opportunities for reflection all contribute to a comprehensive learning experience that aligns well with the principles of situated learning.