State of the art BMIs succeed at decoding behavioral intention from brain activity by mapping features of neuronal ensemble activity onto a motor space 1,2 . Yet the these motor spaces are confined by current technologies to rather simple actions. To prototype a decoder of complex, natural communication signals from neural activity, we capitalize on two aspects of birdsong, a powerful animal model for vocal learning that shares many features with humans speech 3,4 . First, birdsong is temporally structured (like human speech); this temporal patterning can be built into a decoder using a recurrent neural network 5 . Second, the biomechanics of birdsong production are well understood; this enables us to employ a biophysical model of the vocal organ that captures most of the complexity of the song and reduces it to a lower dimensional parameter space 6,7 . By combining these techniques, we decode realistic synthetic birdsong directly from neural activity.Our decoder interfaces with the sensory-motor nucleus HVC (used as proper name), where neurons generate high-level motor commands that shape the production of learned song. Adult zebra finches (Taeniopygia guttata) sing renditions of a stereotyped motif (a sequence of 3-10 syllables), whose temporal and/or motor structure is thought to be encoded in the activities of two major types of HVC neurons (Fig. 1a) [8][9][10][11][12][13] . We implanted 16/32 site Si-probes in male, adult zebra finches and recorded simultaneously their song and neural activity in HVC; then we used these data to train a long-short-term memory network (LSTM 5 ) to translate neural activity directly onto song. The goal of the network is to predict the spectral components of the song at a time bin t i , given the values of neural activity features over previous time bins (Fig. 1d-f). The neural activity is (t , t , )fed as a matrix comprising mean firing rates in each time bin, of each putative single/multi-unit automatically sorted from the recordings 14 (32/64 clusters); the spectral components of the song are represented by the power across 64 log-spaced frequency bands. For each session (day), we separate the 70-110 renditions of a motif the bird sang. We then train the LSTM network to find the neural-to-song spectrum mappings, and decode the corresponding spectral components from a test set of neural activity to finally recover waveforms of synthetically generated song motifs. We employed several methods to avoid overfitting. First, the order in which each pair of neural feature window/target was presented to the network was randomized, so that the predictions of the spectral components at each time point are independent; second, we used standard techniques such as L2 weight regularization, dropout and early stopping (see Methods). We also employed two different procedures to produce the training/validation and test sets. For motif-wise training/decoding, we split the data into peer-reviewed)