Image retrieval using a textual query becomes a major challenge mainly due to human perception subjectivity and the impreciseness of image annotations. These drawbacks can be overcome by focusing on the content of images rather than on the textual descriptions of images. Traditional feature extraction techniques demand for expert knowledge to select the limited feature types and are also sensitive to changing imaging conditions. Deep feature extraction using Convolutional Neural Network (CNN) are a solution to these drawbacks as they can learn the feature representations automatically. This work carries out a detailed performance comparison of various pretrained models of CNN in feature extraction. Features are extracted from men footwear and women clothing datasets using the VGG16, VGG19, InceptionV3, Xception and ResNet50 models. Further, these extracted features are used for classification using SVM, Random Forest and K-Nearest Neighbors classifiers. Results of feature extraction and image retrieval show that VGG19, Inception and Xception features perform well with feature extraction, achieving a good image classification accuracy of 97.5%. These results are further justified by performing a comparison of image retrieval efficiency, with the extracted features and similarity metrics. This work also compares the accuracy obtained by features extracted by the selected pre-trained CNN models with the results obtained using conventional classification techniques on CIFAR 10 dataset. The features extracted using CNN can be used in image-based systems like recommender systems, where images have to be analyzed to generate item profiles.