Deep learning brings the opportunity to achieve effective speech steganalysis in speech signals. However, the speech samples used to train speech steganalysis models (i.e., steganalyzers) are usually sensitive and distributed among different agencies, making it impractical to train an effective centralized steganalyzer. Therefore, in this paper, we present an effective framework, named FedSpy, using federated learning, which enables multiple agencies to securely and jointly train the speech steganalysis models without sharing their speech samples. FedSpy is a flexible and extensible framework that can work effectively in conjunction with various deep-learning-based speech steganalysis methods. We evaluate the performance of FedSpy by detecting the most widely used Quantization Index Modulation-based speech steganography with three state-of-the-art deep-learning-based steganalysis methods representatively. The results show that FedSpy significantly outperforms the local steganalyzers and achieves good detection accuracy comparable to the centralized steganalyzer.