Virtual reality (VR) is a potential assessment format for constructs dependent on certain perceptual characteristics (e.g., realistic environment and immersive experience). The purpose of this series of studies was to explore methods of evaluating reliability and validity evidence for virtual reality assessments (VRAs) when compared with traditional assessments. We intended to provide the basis of a framework on how to evaluate VR assessments given that there are important fundamental differences to VR assessments compared with traditional assessment formats. Two commercial off-the-shelf (COTS) games (i.e., Project M and Richie's Plank Experience)were used in Studies 1 and 2, while a game-based assessment (GBA; Balloon Pop, designed for assessment) was used in Study 3. Studies 1 and 2 provided limited evidence for the reliability and validity of the VRAs. However, no meaningful constructs were measured by the VRA in Study 3. Findings demonstrate limited evidence for these VRAs as viable assessment options through the validity and reliability methods utilized in the present studies, which in turn emphasize the importance of aligning the assessment purpose to the unique advantages of a VR environment.
K E Y W O R D Scommercial-off-the-shelf (COTS) games, game-based assessment (GBA), reliability, validity, virtual reality (VR)
Practitioner points• Findings were mixed in correlating the VRA scores with similar assessments to the intended constructs being measured.• Details are provided on the design and scoring for the presented VRAs.• Although research using VRAs is still preliminary, there are promising methods through which we might design unique behavior based evaluation.
| INTRODUCTIONThe measurement of trait differences using prehire assessments is widely practiced and useful to help differentiate job candidates based on constructs that are related to job performance, such as personality traits like conscientiousness (r = .31) and aptitudes like cognitive ability (r = .51;Hough et al., 2001). The self-report method has demonstrated accurate measurement for certain constructs in various contexts (Chan, 2009