Multilevel models are an increasingly popular method to analyze data that originate from a clustered or hierarchical structure. To effectively utilize multilevel models, one must have an adequately large number of clusters; otherwise, some model parameters will be estimated with bias. The goals for this paper are to (1) raise awareness of the problems associated with a small number of clusters, (2) review previous studies on multilevel models with a small number of clusters, (3) to provide an illustrative simulation to demonstrate how a simple model becomes adversely affected by small numbers of clusters, (4) to provide researchers with remedies if they encounter clustered data with a small number of clusters, and (5) to outline methodological topics that have yet to be addressed in the literature.Keywords Multilevel model . HLM . Small sample . Mixed model . Small number of clusters Frequently in educational psychology research, observations have a hierarchical structure (Raudenbush and Bryk 2002). Students are nested within classrooms; children are nested within families, or teachers are nested within schools. When data are sampled in a multi-stage manner or if observations are clustered, modeling data by ignoring the clustering will often result in standard error estimates that are underestimated if the outcome variable demonstrates dependence based on the clustering (i.e., the intraclass correlation is greater than zero). When clustering is ignored, the residuals will not be identically and independently distributed, violating an assumption of single-level models such as ordinary least-squares regression. This dependence will ultimately result in an inflated type-I error rate for significance tests of regression coefficients. However, in the statistical literature, methods have been developed for addressing data that come from a hierarchical structure and can account for the dependence among observations. One such method has many names and acronyms but is often referred to as hierarchical linear models (HLMs), multilevel models (MLMs, used in this paper), or mixed-effects models (Raudenbush and Bryk 2002). This is the method on which this paper will focus.To estimate MLMs without bias, adequate sample sizes must be obtained, since MLMs are often estimated with maximum likelihood (ML) methods. ML estimates are asymptomatically Educ Psychol Rev