Database technology can now host multimedia applications through the representation of sounds and images, but such new applications also require extensions to HCI technology. This paper examines the problems of querying and manipulating audio information. We argue that no single "style" of user interface can provide a complete solution, and propose two novel types of interface to complement conventional database languages. The first is gestural, and allows users literally to reach into spaces of sounds and to "grab" the required objects. The second involves retrieval by mimicry. The main part of this paper describes our research into the viability of the gestural interface. We have experimented using the ISEE (Intuitive Sound Editing Environment) interface, a four-dimensional perceptually-based space of sounds. Our experiments have involved a user population and a range of multidimensional input devices, and have provided strong evidence that the approach is viable, but that the choice of input devices has a significant impact on the usability of the system. The second proposed interface, which we are currently researching, involves the use of neural networks within the data model to derive perceptually-based attributes. The neural networks can be trained on expertly created sound spaces, together with vocal imitations of the sounds, and subsequently used to retrieve on the basis of vocal imitations of the required sounds.