Thousands of species use vocal signals to communicate with one another. Vocalisations carry rich information, yet characterising and analysing these complex, high-dimensional signals is difficult and prone to human bias. Moreover, animal vocalisations are ethologically relevant stimuli whose representation by auditory neurons is an important subject of research in sensory neuroscience. A method that can efficiently generate naturalistic vocalisation waveforms would offer an unlimited supply of stimuli with which to probe neuronal computations. While unsupervised learning methods allow for the projection of vocalisations into low-dimensional latent spaces learned from the waveforms themselves, and generative modelling allows for the synthesis of novel vocalisations for use in downstream tasks, there is currently no method that would combine these tasks to produce naturalistic vocalisation waveforms for stimulus playback. In this paper, we demonstrate BiWaveGAN: a bidirectional Generative Adversarial Network (GAN) capable of learning a latent representation of ultrasonic vocalisations (USVs) from mice. We show that BiWaveGAN can be used to generate, and interpolate between, realistic vocalisation waveforms. We then use these synthesised stimuli along with natural USVs to probe the sensory input space of mouse auditory cortical neurons. We show that stimuli generated from our method evoke neuronal responses as effectively as real vocalisations, and produce receptive fields with the same predictive power. BiWaveGAN is not restricted to mouse USVs but can be used to synthesise naturalistic vocalisations of any animal species and interpolate between vocalisations of the same or different species, which could be useful for probing categorical boundaries in representations of ethologically relevant auditory signals.