This paper addresses the problem of recognizing semantic content from images and video for content based retrieval purposes. Semantic features are derived from a collection of low-level features based on color, texture and shape combined together to form composite feature vectors. Both Manhattan distance and Neural Networks are used as classifiers for recognition purposes. Discrimination is done using five semantic classes viz. mountains, forests, flowers, highways and buildings. The composite feature is represented by a 26-element vector comprising of 18 color components, 2 texture components and 6 shape components.