Infant research is often underpowered, undermining the robustness and replicability of our findings. Improving the reliability of infant studies offers a solution for increasing statistical power independent of sample size. Here, we discuss two senses of the term reliability in the context of infant research: reliable (large) effects and reliable measures. We examine the circumstances under which effects are strongest and measures are most reliable, and use synthetic datasets to illustrate the relationship between effect size, measurement reliability, and statistical power. We then present six concrete solutions for more reliable infant research: (1) routinely estimating and reporting the effect size and measurement reliability of infant tasks, (2) selecting the best measurement tool, (3) developing better infant paradigms, (4) collecting more data points per infant, (5) excluding unreliable data from analysis, and (6) conducting more sophisticated data analyses. Deeper consideration of measurement in infant research will improve our ability to study infant development.