Similarity learning plays a fundamental role in the fields of multimedia retrieval and pattern recognition. Prediction of perceptual similarity is a challenging task as in most cases we lack human labeled ground-truth data and robust models to mimic human visual perception. Although in the literature, some studies have been dedicated to similarity learning, they mainly focus on the evaluation of whether or not two images are similar, rather than prediction of perceptual similarity which is consistent with human perception. Inspired by the human visual perception mechanism, we here propose a novel framework in order to predict perceptual similarity between two texture images. Our proposed framework is built on the top of Convolutional Neural Networks (CNNs). The proposed framework considers both powerful features and perceptual characteristics of contours extracted from the images. The similarity value is computed by aggregating resemblances between the corresponding convolutional layer activations of the two texture maps. Experimental results show that the predicted similarity values are consistent with the human-perceived similarity data.