This paper describes, and lays out an argument for, the use of a procedure to help groups of reviewers to judge the quality of prior research reports. It argues why such a procedure is needed, and how other existing approaches are only relevant to some kinds of research, meaning that a review or synthesis cannot successfully combine quality judgements of different types of research. The proposed procedure is based on four main factors: the fit between the research question(s) for any study and its design(s); the size of the smallest group of cases used in the headline analyses; the amount and skewness of missing data; and the quality of the data collected. This simple procedure is now relatively widely used, and has been found to lead to widespread agreement between reviewers. It can fundamentally change the findings of a review of evidence, compared to the conclusions that would emerge from a more traditional review that did not include genuine quality rating of prior evidence. And powerfully, because it is not technical, it permits users to help judge research findings. This is important as there is a growing demand for evidence‐led approaches in areas of social science such as education, wherein summaries of evidence must be as trustworthy as possible.