Background: Evaluation studies frequently draw on fallible outcomes that contain significant measurement error. Ignoring outcome measurement error in the planning stages can undermine the sufficiency and efficiency of an otherwise well-designed study and can further constrain the evidence studies bring to bear on the effectiveness of programs. Objectives: We develop simple formulas to adjust statistical power, minimum detectable effect (MDE), and optimal sample allocation formulas for two-level cluster- and multisite-randomized designs when the outcome is subject to measurement error. Results: The resulting adjusted formulas suggest that outcome measurement error typically amplifies treatment effect uncertainty, reduces power, increases the MDE, and undermines the efficiency of conventional optimal sampling schemes. Therefore, achieving adequate power for a given effect size will typically demand increased sample sizes when considering fallible outcomes, while maintaining design efficiency will require increasing portions of a budget be applied toward sampling a larger number of individuals within clusters. We illustrate evaluation planning with the new formulas while comparing them to conventional formulas using hypothetical examples based on recent empirical studies. To encourage adoption of the new formulas, we implement them in the R package PowerUpR and in the PowerUp software.