This paper presents the first publicly available bimodal electroencephalography (EEG) / functional magnetic resonance imaging (fMRI) dataset and an open source benchmark for inner speech decoding. Decoding inner speech or thought (expressed through a voice without actual speaking); is a challenge with typical results close to chance level. The dataset comprises 1280 trials (4 subjects, 8 stimuli = 2 categories * 4 words, and 40 trials per stimuli) in each modality. The pilot study reports for the binary classification, a mean accuracy of 71.72\% when combining the two modalities (EEG and fMRI), compared to 62.81% and 56.17% when using EEG, resp. fMRI alone. The same improvement in performance for word classification (8 classes) can be observed (30.29% with combination, 22.19%, and 17.50% without). As such, this paper demonstrates that combining EEG with fMRI is a promising direction for inner speech decoding.