In this paper, a novel framework for multimodal search and retrieval of rich media objects is presented. The searchable items are media representations consisting of multiple modalities, such as 2D images, 3D objects and audio files, which share a common semantic concept. A manifold learning technique based on Laplacian Eigenmaps was appropriately modified in order to merge the low-level descriptors of each separate modality and create a new low-dimensional multimodal feature space, where all media objects can be mapped irrespective of their constituting modalities. To accelerate search and retrieval and make the framework suitable even for web-scale applications, a multimedia indexing scheme is adopted, which represents each object of the dataset by the ordering of a number of reference objects. Moreover, the hubness property is introduced in this paper as a criterion to select the most representative reference objects, thus, achieving the maximum possible performance of indexing. The content-based similarity of the multimodal descriptors is also used to automatically annotate the objects of the dataset using a predefined set of attributes. Annotation propagation is utilized to approximate the multimodal descriptors for multimodal queries that do not belong to the dataset.