Cluster randomized trials (CRTs) were originally proposed for use when randomization at the subject level is practically infeasible or may lead to a severe estimation bias of the treatment effect. However, recruiting an additional cluster costs more than enrolling an additional subject in an individually randomized trial. Under budget constraints, researchers have proposed the optimal sample sizes in two-level CRTs. CRTs may have a three-level structure, in which two levels of clustering should be considered. In this paper, we propose optimal designs in three-level CRTs with a binary outcome, assuming a nested exchangeable correlation structure in generalized estimating equation models. We provide the variance of estimators of three commonly used measures: risk difference, risk ratio, and odds ratio. For a given sampling budget, we discuss how many clusters and how many subjects per cluster are necessary to minimize the variance of each measure estimator. For known association parameters, the locally optimal design is proposed. When association parameters are unknown but within predetermined ranges, the MaxiMin design is proposed to maximize the minimum of relative efficiency over the possible ranges, that is, to minimize the risk of the worst scenario. KEYWORDS cluster randomized trial (CRT), dissemination and implementation, generalized estimating equation (GEE), intracluster correlation coefficient (ICC), nested correlation structure Statistics in Medicine. 2019;38:3733-3746.wileyonlinelibrary.com/journal/sim