In this commentary, we clarify the meaning of the generalizability-theory-based coefficients reported in our multisite reliability study of fMRI measures of regional brain activation during an emotion processing task (Gee et al., Human Brain Mapping 2015;36:2558-2579. While the original paper reported generalizability and dependability coefficients based on the design of our traveling subjects study (in which each subject was scanned twice at each of eight sites), those coefficients are of limited applicability outside of the reliability study context. Here we report generalizability and dependability coefficients that represent the reliability one can expect for a multisite study, in which a given subject is scanned once on a scanner drawn randomly from the pool of available scanners (i.e., analogous to the more typical multisite study design). We also characterize the implications of a multisite versus single-site study design for statistical power, including Figure 1 that shows sample size requirements to detect activation in two key nodes of the emotion processing circuitry given observed differences in reliability of measurement between single-site and multisite designs.We take this opportunity to clarify the meaning of the statistics reported in our study examining reliability of fMRI measures of brain activation during an emotion processing task (Gee et al., 2015) and to consider their implications for statistical power in single-site versus multisite designs.In our report, we used a variance components framework and an application of generalizability theory (Shavelson & Webb, 1991) to probe the robustness of such measures in a multisite context. Given the design of our study, in which eight human subjects were scanned twice on successive days at each of eight sites, the proportion of variance due to person from the variance components analysis (shown in figure 3 in Gee et al., 2015) represents the reliability one can expect in a typical multisite study where subject measurements are based on singlesession fMRI data, each acquired on different scanners depending on the site where the subject was recruited. We wish to make explicit that in applying generalizability theory, we estimated reliability by calculating generalizability and dependability coefficients for a study design corresponding to the design of the full traveling subject study, thus reflecting the reliability in relative and absolute measurement, respectively, that one can expect when every subject is scanned twice on each of eight different scanners. The corresponding generalizability and dependability coefficients (shown in figure 4 and cited in the abstract in Gee et al., 2015) ranged from 0.0 to 0.9 for maximum activation across multiple task contrasts and regions of interest, but were generally at or above 0.5, as would be expected when each subject's measurement is based on the aggregation of 16 scan sessions. Thus, the coefficients reported apply to the reliability of the measures from the reliability study itself, that is, for...