Informal learning institutions, such as museums, science centers, and community-based organizations, play a critical role in providing opportunities for students to engage in science, technology, engineering, and mathematics (STEM) activities during out-of-school time hours. In recent years, thousands of studies, evaluations, and conference proceedings have been published measuring the impact that these programs have had on their participants. However, because studies of informal science education (ISE) programs vary considerably in how they are designed and in the quality of their designs, it is often quite difficult to assess their impact on participants. Knowing whether the outcomes reported by these studies are supported with sufficient evidence is important not only for maximizing participant impact, but also because there are considerable economic and human resources invested to support informal learning initiatives. To address this problem, I used the theories of impact analysis and triangulation as a framework for developing user-friendly rubrics for assessing quality of research designs and evidence of impact. I used two main sources, research-based recommendations from STEM governing bodies and feedback from a focus group, to identify criteria indicative of high-quality STEM research and study design. Accordingly, I developed three STEM Research Design Rubrics, one for quantitative studies, one for qualitative studies, and another for mixed methods studies, that can be used by ISE researchers, practitioners, and evaluators to assess research design quality. Likewise, I developed three STEM Impact Rubrics, one for quantitative studies, one for qualitative studies, and another for mixed methods studies, that can be used by ISE researchers, practitioners, and evaluators to assess evidence of outcomes. The rubrics developed in this study are practical tools that can be used by ISE researchers, practitioners, and evaluators to improve the field of informal science learning by increasing the quality of study design and for discerning whether studies or program evaluations are providing sufficient evidence of impact.