An often addressed challenge in neuroscience research is the assignment of different tasks to specific brain regions. In many cases several brain regions are activated during a single task. Therefore, one is also interested in the temporal evolution of brain activity to infer causal relations between activated brain regions. These causal relations may be described by a directed, task specific network which consists of activated brain regions as vertices and directed edges. The edges describe the causal relations. Inference of the task specific brain network from measurements like electroencephalography (EEG) or functional magnetic resonance imaging (fMRI) is challenging, due to the low spatial resolution of the former and the low temporal resolution of the latter. Here, we present a simulation study investigating a possible combined analysis of simultaneously measured EEG and fMRI data to address the challenge specified above. A nonlinear state space model is used to distinguish between the underlying brain states and the (simulated) EEG/fMRI measurements. We make use of a modified unscented Kalman filter and a corresponding unscented smoother for the estimation of the underlying neural activity. Model parameters are estimated using an expectation-maximization algorithm, which exploits the partial linearity of our model. Inference of the brain network structure is then achieved using directed partial correlation, a measure for Granger causality. The results indicate that the convolution effect of the fMRI forward model imposes a big challenge for the parameter estimation and reduces the influence of the fMRI in combined EEG-fMRI models. It remains to be investigated whether other models or similar combinations of other modalities such as, e.g., EEG and magnetoencephalography can increase the profit of the promising idea of combining various modalities.