Visual working memory (VWM) is a central bottleneck in human information processing. Its capacity is most often measured in terms of how many individual-item representations VWM can hold (k). In the standard task employed to estimate k, an array of highly discriminable colour patches is maintained and, after a short retention interval, compared to a test display (change detection). Recent research has shown that with more complex, structured displays, change-detection performance is, in addition to individual-item representations, supported by ensemble representations formed as a result of spatial subgroupings. Here, by asking participants to additionally localize the change, we reveal indication for an influence of ensemble representations even in the very simple, unstructured displays of the colour-patch change-detection task. Critically, pure-item models from which standard formulae of k are derived do not consider ensemble representations and, therefore, potentially overestimate k. To gauge this overestimation, we develop an item-plus-ensemble model of change detection and change localization. Estimates of k from this new model are about 1 item (~30%) lower than the estimates from traditional pure-item models, even if derived from the same data sets.