The purpose of the present study was to use a commercially available grayscale phantom to compare two ultrasound systems regarding their ability to reproduce clinically relevant low‐contrast objects at different sizes and depths, taking into account human observer variability and other methodological issues related to observer performance studies. One high‐end and one general ultrasound scanner from the same manufacturer using the same probe were included. The study was intended to simulate the clinical situation where small low‐contrast objects are embedded in relatively homogeneous organs. Images containing 4 and 6.4 mm objects of four different contrasts were acquired from the grayscale phantom at different depths. Six observers participated in a 4‐alternative forced‐choice study based on 960 images. Case sample and human observer variabilities were taken into account using bootstrapping. At four of sixteen depth/size/contrast combinations, the visual performance of the high‐end scanner was significantly higher. Thus, it was possible to use a grayscale phantom to discriminate between the two evaluated ultrasound systems in terms of their ability to reproduce clinically relevant low‐contrast objects. However, the number of images and number of observers were larger than those usually used for constancy control.PACS number(s): 87.57.C‐, 87.63.dh