422Linn and Petersen (1985) separated spatial ability tests into three categories based on effect size. One category of ability was spatial visualization, the ability to manipulate complex spatial information when several stages are needed to produce the correct solution. Gender differences in spatial visualization had a small effect size and were statistically nonsignificant. Another category of ability was spatial perception, the ability to determine spatial relations despite irrelevant information. Gender differences in spatial perception had a medium effect size and were statistically significant. The third category of ability was mental rotation, the ability to rotate two-or three-dimensional figures quickly and accurately in imagination and to compare them with other similar figures. Mental rotation was the only spatial ability category to yield a large gender difference effect size and was also statistically significant.More recently, Voyer, Voyer, and Bryden (1995) conducted a meta-analysis of the three categories of spatial abilities and found similar results. Spatial visualization tasks showed significant gender differences only for participants who were over 18 years old; for younger participants, there were no significant gender differences. Spatial perception tasks showed a significant gender difference for participants who were 13 and older. Mental rotation tasks showed gender differences for participants of all ages. For all three spatial ability categories, there was a significant linear increase in effect size with increasing age, possibly indicating that sexual differentiation is important in gender differences in spatial ability. They further found that the largest effect was found for the Mental Rotations Test (MRT; Vandenberg & Kuse, 1978).Wraga, Duncan, Jacobs, Helt, and Church (2006) report that, although traditional gender gaps in cognitive performance have diminished over many years, mental rotation tasks have consistently yielded large and reliable gender differences of about 1 standard deviation with no significant reduction. At least one study, however, found that, under certain circumstances, gender differences in the MRT did not hold.Goldstein, Haldane, and Mitchell (1990) administered the MRT under two different sets of instructions. In Experiment 1, participants were allowed the standard 3 min of time for each 10-item half of the test. The standard method of scoring the MRT is to count correct only those items for which both correct alternatives are marked (maximum possible score of 20). Using the standard scoring method, the probability of scoring a point by chance is .16. Instead of using the standard method of scoring, Goldstein et al. calculated scores in two alternate ways. One way was to count the number of correct alternatives chosen (maximum possible score of 40). With this method, the probability of scoring a point by chance is .5 for the first guess of an item. Goldstein et al.'s second scoring method was to derive the ratio of the number of correct alternatives chosen ...