The multilevel latent growth curve model (MLGCM), which is subsumed by the multilevel structural equation modeling framework, has been advocated as a means of investigating individual and cluster trajectories. Still, how to evaluate the goodness of fit of MLGCMs has not been well addressed. The purpose of this study was to conduct a systematic Monte Carlo simulation to carefully investigate the effectiveness of (a) level-specific fit indices and (b) target-specific fit indices in an MLGCM, in terms of their independence from the sample size's influence and their sensitivity to misspecification in the MLGCM that occurs in either the between-covariance, betweenmean, or within-covariance structure. The design factors included the number of clusters, the cluster size, and the model specification. We used Mplus 7.4 to generate simulated replications and estimate each of the models. We appropriately controlled the severity of misspecification when we generated the simulated replications. The simulation results suggested that applying RMSEA T_S_COV , TLI T _ S _ COV , and SRMR B maximizes the capacity to detect misspecifications in the between-covariance structure. Moreover, the use of RMSEA PS _ B , CFI PS _ B , and TLI PS _ B is recommended for evaluating the fit of the between-mean structure. Finally, we found that evaluation of the within-covariance structure turned out to be unexpectedly challenging, because none of the within-level-specific fit indices (RMSEA PS _ W , CFI PS _ W , TLI PS _ W , and SRMR W) had a practically significant sensitivity. Keywords Fit index. Model evaluation. Multilevel latent growth curve model. Multilevel structural equation modeling A panel study is a powerful longitudinal design in which data are observed or gathered from exactly the same people, group, or organization across multiple time points (Neuman, 2009). Panel studies allow researchers to investigate a moving picture of observed units over time (i.e., a trajectory), rather than a single snapshot, as in cross-sectional studies. During the past few decades, two-stage cluster sampling (TCS) has been widely adopted for most large-scale panel studies (e.g., the Education Longitudinal Study of 2002, Ingels et al.