Qian and Schedl's Depth of Vocabulary Knowledge Test was administered to 31 native-speaker undergraduates under an "unconstrained" condition, in which the number of responses to headwords was unfixed, whereas a corresponding group (n = 36) completed the test under the original "constrained" condition. Results revealed lower accuracy in the unconstrained condition and in paradigmatic versus syntagmatic responses. Native speakers failed to reach the 90% criterion on most unconstrained and many constrained items. Although certain modifications could improve such a test (e.g., eliminating psycholinguistically anomalous headwords, such as adjectives, or presenting responses to headwords discontinuously), two intransigent problems impede test validity. First, collocates in the mental lexicon differ in tightness and vary across dialects, sociolects, and age groups. Second, it is more serious that second-language Depth of Vocabulary Knowledge Tests are likely spot checks of metalinguistic knowledge rather than depth tests that reflect what learners would actually produce in spontaneous utterances.The last two decades have seen a renewal of interest in vocabulary and a surge in the number of studies on second-language (L2) lexical acquisition. Despite this progress, L2 lexical studies frequently remain uninformed by current psycholinguistic research. Nowhere is this more evident than in the case of lexical testing and the linguistic and psycholinguistic frameworks within which lexical testing is conducted. Standardized, valid, and reliable vocabulary measures are still scarce, particularly so when one considers languages spoken outside the few countries able to sustain lexical testing research. As would be expected, the situation is the best in the case of English, for which there are now quite a few lexical tests