Virtual reality exposure therapy (VRET) can have a significant impact towards assessing and potentially treating various anxiety disorders. One of the main strengths of VRET systems is that they provide an opportunity for a psychologist to interact with virtual 3D environments and change therapy scenarios according to the individual patient’s needs. However, to do this efficiently the patient’s anxiety level should be tracked throughout the VRET session. Therefore, in order to fully use all advantages provided by the VRET system, a mental stress detection system is needed. The patient’s physiological signals can be collected with wearable biofeedback sensors. Signals like blood volume pressure (BVP), galvanic skin response (GSR), and skin temperature can be processed and used to train the anxiety level classification models. In this paper, we combine VRET with mental stress detection and highlight potential uses of this kind of VRET system. We discuss and present a framework for anxiety level recognition, which is a part of our developed cloud-based VRET system. Physiological signals of 30 participants were collected during VRET-based public speaking anxiety treatment sessions. The acquired data were used to train a four-level anxiety recognition model (where each level of ‘low’, ‘mild’, ‘moderate’, and ‘high’ refer to the levels of anxiety rather than to separate classes of the anxiety disorder). We achieved an 80.1% cross-subject accuracy (using leave-one-subject-out cross-validation) and 86.3% accuracy (using 10 × 10 fold cross-validation) with the signal fusion-based support vector machine (SVM) classifier.