Objective. The goal of this study is to decode the electrical activity of single neurons in the human subthalamic nucleus (STN) to infer the speech features that a person articulated, heard or imagined. We also aim to evaluate the amount of subthalamic neurons required for high accuracy decoding suitable for real-life speech brain-machine interfaces. Approach. We intraoperatively recorded single-neuron activity in the STN of 21 neurosurgical patients with Parkinson's disease undergoing implantation of deep brain stimulator (DBS) while patients produced, perceived or imagined the five monophthongal vowel sounds. Our decoder is based on machine learning algorithms that dynamically learn specific features of the speech-related firing patterns. Main results. In an extensive comparison of algorithms, our sparse decoder ("SpaDe"), based on sparse decomposition of the high dimensional neuronal feature space, outperformed the other algorithms in all three conditions: production, perception and imagery. For speech production, our algorithm, Spade, predicted all vowels correctly (accuracy: 100%; chance level: 20%). For perception accuracy was 96%, and for imagery: 88%. The accuracy of Spade showed a linear behavior in the amount of neurons for the perception data, and even faster for production or imagery. Significance. Our study demonstrates that the information encoded by single neurons in the STN about the production, perception and imagery of speech is suitable for high-accuracy decoding. It is therefore an important step towards brain-machine interfaces for restoration of speech faculties that bears an enormous potential to alleviate the suffering of completely paralyzed ("locked-in") patients and allow them to communicate again with their environment. Moreover, our research indicates how many subthalamic neurons may be necessary to achieve each level of decoding accuracy, which is of supreme importance for a neurosurgeon planning the implantation of a speech brain-machine interface.