The Two‐Out‐of‐Five test is a method of unspecified difference testing. Although its low guessing probability (1/10) gives promise that it might have high power, the theoretical underpinnings of the method have not yet been investigated. In this article, we offer the first such investigation, via Thurstonian analysis. This investigation reveals that the standard form of the Two‐Out‐of‐Five test is more statistically powerful than the Triangle test, but not as powerful as the Tetrad test. We then propose a new way of scoring Two‐Out‐of‐Five data that yields a test with higher power and lower sample size requirements than the Tetrad test, under the assumption that there is no additional noise from the evaluation of an additional stimulus. This last result is achieved without any experimental modification of the Two‐Out‐of‐Five protocol. Tables for estimating the Thurstonian measure of sensory effect size, δ, for calculating the error in such estimates, and for recommended sample sizes are given. Finally, caution is given against incorrect instructions in the Two‐Out‐of‐Five test – if respondents are asked simply to identify the two most similar samples, the resulting test has almost no power.
Practical Applications
This article shows that the standard form of the Two‐Out‐of‐Five test is more powerful than the Triangle test, but is not as powerful as the Tetrad test. The article then proposes a new method of scoring Two‐Out‐of‐Five data that requires smaller sample sizes than the Tetrad test, under the assumption that the evaluation of an additional stimulus does not lead to an increase in perceptual noise. When this alternate method of scoring the data is used, the test is called the Two‐Out‐of‐Five test with forgiveness. Tables for estimating δ, the Thurstonian measure of sensory difference, together with B values that allow practitioners to calculate the variance in their estimates are given for both forms of the Two‐Out‐of‐Five test. Tables of recommended sample sizes are also given in each case. Finally, it is noted that care must be taken to give correct instructions to respondents – it is possible to create a test with very low power if respondents are simply asked to identify the two most similar stimuli.