In the field of neurobiology of language, neuroimaging studies are generally based on stimulation paradigms consisting of at least two different conditions. Depending on the desired evaluation, these conditions, in turn, have to contain dozens of items to achieve a good signal to noise ratio. Designing those paradigms can be very time-consuming. Subsequently, a group of participants is stimulated with the new paradigm, while brain activity is assessed, e.g. with EEG/MEG. The measured data are then pre-processed and finally contrasted according to the different stimulus conditions. In this way, only a limited number of analyses and hypothesis tests can be performed, while for alternative or further analyses, completely new paradigms usually need to be designed. This traditional approach is necessarily data-limited, and the cost-benefit ratio is therefore rather poor. In contrast, in computational linguistics analyses are based on text corpora, which allow a vast variety of hypotheses to be tested by repeatedly re-evaluating the data set. Furthermore, text corpora also allow exploratory data analysis in order to generate new hypotheses. By combining the two approaches, we here present a unified approach of continuous natural speech and MEG to generate a corpuslike database of speech-evoked neuronal activity.