Cluster-randomized experiments are widely used due to their logistical convenience and policy relevance. To analyse them properly, we must address the fact that the treatment is assigned at the cluster level instead of the individual level. Standard analytic strategies are regressions based on individual data, cluster averages and cluster totals, which differ when the cluster sizes vary. These methods are often motivated by models with strong and unverifiable assumptions, and the choice among them can be subjective.Without any outcome modelling assumption, we evaluate these regression estimators and the associated robust standard errors from the design-based perspective where only the treatment assignment itself is random and controlled by the experimenter. We demonstrate that regression based on cluster averages targets a weighted average treatment effect, regression based on individual data is suboptimal in terms of efficiency and regression based on cluster totals is consistent and more efficient with a large number of clusters. We highlight the critical role of covariates in improving estimation efficiency and illustrate the efficiency gain via both simulation studies and data analysis. The asymptotic analysis also reveals the efficiency-robustness trade-off by comparing the properties of various estimators using data at different levels with and without covariate adjustment. Moreover, we show that the robust standard errors are convenient approximations to the true asymptotic | 995 SU and dInG
| INTRODUCTIONCluster-randomized experiments, also known as group-randomized trials, are widely used in empirical research, where randomization over units within clusters is either unethical or logistically infeasible. For example, in many public health interventions, clusters are villages, units are households and randomization is implemented at the village level (Donner & Klar, 2000; Turner et al., 2017a,b); in many educational interventions, clusters are classrooms, units are students and randomization is implemented at the classroom level (Raudenbush, 1997;Raudenbush & Schwartz, 2020;Schochet, 2013;Schochet et al., 2021). In addition to their convenience, cluster-randomized experiments can circumvent the problem of interference among units and are policy relevant if the intervention of a policy is at the cluster level.A proper analysis of a cluster-randomized experiment must first clearly specify the population and parameter of interest and then address the fact that randomization is at the cluster level. Modelbased analyses address these issues simultaneously by imposing certain parametric assumptions and correlation structure on the error terms (Donner & Klar, 2000;Graubard & Korn, 1994;Green & Vavreck, 2008). However, these modelling assumptions are often too strong and can lead to bias under model misspecifications. We take an alternative perspective by first defining the parameter of interest based on potential outcomes for treatment effects and then deriving the properties of the regression estimators under t...