Accurate voice humming transcription and efficient indexing and retrieval schemes are essential to a large-scale humming-based audio retrieval system. Although much research has been done to develop such schemes, their performance in terms of precision, recall, and F-measure, among all similarity metrics, are still unsatisfactory. In this paper, we propose a new voice query transcription scheme. It considers the following features: note onset detection using dynamic threshold methods, fundamental frequency (F0) acquisition of each frame, and frequency realignment using K-means. We use a popularity-adaptive indexing structure called frequently accessed index (FAI) based on frequently queried tunes for indexing purposes. In addition, we propose a semi-supervised relevance feedback and query reformulation scheme based on a genetic algorithm to improve retrieval efficiency. In this paper, we extend our efforts to mobile multimedia environments and develop a mobile audio retrieval system. Experiments show our system performs satisfactory in wireless mobile multimedia environments.