Within a set of relationships, large‐scale high stakes testing induces consequences for its stakeholders—intended and unintended, positive and negative—between testing, teaching, and learning. At the core of this phenomenon lies the (mis)use of test scores and of the values and stakes attached to a test in society and within the pedagogical context where a particular test exists. Washback and impact in applied linguistics refer to two levels: “impact”—the effects of tests on macro‐levels of education and society, and “washback”—the effects of tests on micro‐levels of classroom teaching and learning. Over the past two decades empirical research has established the relationship between testing and teaching; this period has seen an increasing number of studies on learning and learners, as well as on other stakeholders such as publishers, parents, and employers; and studies have increasingly focused on the importance of contextual factors in testing. The challenges for future research relate to the call for collecting validity evidence from multiple stakeholders by using multiple methods to understand this complex phenomenon.