Motor imagery-based brain-computer interfaces (BCIs) use an individuals ability to volitionally modulate localized brain activity, often as a therapy for motor dysfunction or to probe causal relations between brain activity and behavior. However, many individuals cannot learn to successfully modulate their brain activity, greatly limiting the efficacy of BCI for therapy and for basic scientific inquiry. Formal experiments designed to probe the nature of BCI learning have offered initial evidence that coherent activity across spatially distributed and functionally diverse cognitive systems is a hallmark of individuals who can successfully learn to control the BCI. However, little is known about how these distributed networks interact through time to support learning. Here, we address this gap in knowledge by constructing and applying a multimodal network approach to decipher brain-behavior relations in motor imagery-based brain-computer interface learning using magnetoencephalography. Specifically, we employ a minimally constrained matrix decomposition method -non-negative matrix factorization -to simultaneously identify regularized, covarying subgraphs of functional connectivity, to assess their similarity to task performance, and to detect their time-varying expression. We find that learning is marked by diffuse brain-behavior relations: good learners displayed many subgraphs whose temporal expression tracked performance. Individuals also displayed marked variation in the spatial properties of subgraphs such as the connectivity between the frontal lobe and the rest of the brain, and in the temporal properties of subgraphs such as the stage of learning at which they reached maximum expression. From these observations, we posit a conceptual model in which certain subgraphs support learning by modulating brain activity in regions important for sustaining attention. To test this model, we use tools that stipulate regional dynamics on a networked system (network control theory), and find that good learners display a single subgraph whose temporal expression tracked performance and whose architecture supports easy modulation of brain regions important for attention. The nature of our contribution to the neuroscience of BCI learning is therefore both computational and theoretical; we first use a minimally-constrained, individual specific method of identifying mesoscale structure in dynamic brain activity to show how global connectivity and interactions between distributed networks supports BCI learning, and then we use a formal network model of control to lend theoretical support to the hypothesis that these identified subgraphs are well suited to modulate attention.