This paper presents an accurate and robust embedded motor-imagery brain-computer interface (MI-BCI). The proposed novel model, based on EEGNet [1], matches the requirements of memory footprint and computational resources of low-power microcontroller units (MCUs), such as the ARM Cortex-M family. Furthermore, the paper presents a set of methods, including temporal downsampling, channel selection, and narrowing of the classification window, to further scale down the model to relax memory requirements with negligible accuracy degradation. Experimental results on the Physionet EEG Motor Movement/Imagery Dataset show that standard EEGNet achieves 82.43%, 75.07%, and 65.07% classification accuracy on 2-, 3-, and 4-class MI tasks in global validation, outperforming the stateof-the-art (SoA) convolutional neural network (CNN) by 2.05%, 5.25%, and 6.49%. Our novel method further scales down the standard EEGNet at a negligible accuracy loss of 0.31% with 7.6× memory footprint reduction and a small accuracy loss of 2.51% with 15× reduction. The scaled models are deployed on a commercial Cortex-M4F MCU taking 101 ms and consuming 4.28 mJ per inference for operating the smallest model, and on a Cortex-M7 with 44 ms and 18.1 mJ per inference for the mediumsized model, enabling a fully autonomous, wearable, and accurate low-power BCI.