Having effective methods to access the desired images is essential nowadays with the availability of a huge amount of digital images. The proposed approach is based on an analogy between content-based image retrieval and text retrieval. The aim of the approach is to build a meaningful mid-level representation of images to be used later on for matching between a query image and other images in the desired database. The approach is based firstly on constructing different visual words using local patch extraction and fusion of descriptors. Secondly, we introduce a new method using multilayer pLSA to eliminate the noisiest words generated by the vocabulary building process. Thirdly, a new spatial weighting scheme is introduced that consists of weighting visual words according to the probability of each visual word to belong to each of the n Gaussian. Finally, we construct visual phrases from groups of visual words that are involved in strong association rules. Experimental results show that our approach outperforms the results of traditional image retrieval techniques.