The structural image similarity index (SSIM), introduced by Wang and Bovik (IEEE Signal Processing Letters 9-3, pp. 81-84, 2002) measures the similarity between images in terms of luminance, contrast en structure. It has successfully been deployed to model human visual perception of image distortions and modifications in a wide range of different imaging applications. Chang and Zhang (Infrared Physics & Technology 51-2, pp. 83-90, 2007) recently introduced the target structural similarity (TSSIM) clutter metric, which deploys the SSIM to quantify the similarity of a target to its background in terms of luminance, contrast en structure. They showed that the TSSIM correlates significantly with mean search time and detection probability. However, it is not immediately obvious to what extent each of the three TSSIM components contributes to this correlation. Here we evaluate the TSSIM by deploying it to a set of natural images for which human visual search data are available: the Search_2 dataset. By analyzing the predictive performance of each of the three TSSIM components, we find that it is predominantly the structural similarity component which determines human visual search performance, whereas the luminance and contrast components of the TSSIM show no relation with human performance. Since the structural similarity component of the TSSIM is equivalent to a matched filter, it appears that matched filtering predicts human visual performance when searching for a known target.