We describe a system for automatically scoring a vocabulary item type that asks test-takers to use two specific words in writing a sentence based on a picture. The system consists of a rule-based component and a machine learned statistical model which uses a variety of construct-relevant features. Specifically, in constructing the statistical model, we investigate if grammar, usage, and mechanics features developed for scoring essays can be applied to short answers, as in our task. We also explore new features reflecting the quality of the collocations in the response, as well as features measuring the consistency of the response to the picture. System accuracy in scoring is 15 percentage points greater than the majority class baseline and 10 percentage points less than human performance.