According to the situation that speech perceptual hashing methods are not appropriated for real-time speech content authentication in mobile computing environment, a novel DWT-based perceptual hashing algorithm, which uses a combination of time-domain and frequency-domain features, was proposed to protect the speech data in the cloud. Firstly, by discrete wavelet transform (DWT), a new signal in frequency-domain is generated from the original speech signal after pre-processing and intensity-loudness transform (ILT). Secondly, coefficients of low frequency wavelet decomposition are partitioned into equal-sized and non-overlapping blocks, and logarithmic short-time energy of each block is computed to obtain speech signal's features in frequency-domain. Finally, combined with spectral flux features (SFF) of speech signal in time-domain, a ternary perceptual hashing sequence is created. Experiment results illustrate that ternary form is better to stand for hash digest than binary form, the proposed algorithm has a good robustness against content preserving operations, discrimination, good compaction and high efficiency, and detects the tamper localisation as well.