Extracting visual features for image retrieval by mimicking human cognition remains a challenge. Opponent color and HSV color spaces can mimic human visual perception well. In this paper, we improve and extend the CDH method using a multi-stage model to extract and represent an image in a way that mimics human perception. Our main contributions are as follows: (1) a visual feature descriptor is proposed to represent an image. It has the advantages of a histogram-based method and is consistent with visual perception factors such as spatial layout, intensity, edge orientation, and the opponent colors. (2) We improve the distance formula of CDHs; it can effectively adjust the similarity between images according to two parameters. The proposed method provides efficient performance in similar image retrieval rather than instance retrieval. Experiments with four benchmark datasets demonstrate that the proposed method can describe color, texture, and spatial features and performs significantly better than the color volume histogram, color difference histogram, local binary pattern histogram, and multi-texton histogram, and some SURF-based approaches.