Engaging students in creating learning resources has demonstrated pedagogical benefits. However, to effectively utilise a repository of student-generated content (SGC), a selection process is needed to separate high-from low-quality resources as some of the resources created by students can be ineffective, inappropriate, or incorrect. A common and scalable approach is to use a peer review process where students are asked to assess the quality of resources authored by their peers. Given that judgements of students, as experts-in-training, cannot wholly be relied upon, a redundancy-based method is widely employed where the same assessment task is given to multiple students. However, this approach introduces a new challenge, referred to as the consensus problem: How can we assign a final quality to a resource given ratings by multiple students? To address this challenge, we investigate the predictive performance of 18 inference models across five well-established categories of consensus approaches for inferring the quality of SGC at scale. The analysis is based on the engagement of 2,141 undergraduate students across five courses in creating 12,803 resources and 77,297 peer reviews. Results indicate that the quality of reviews is quite diverse, and students tend to overrate. Consequently, simple statistics such as mean and median fail to identify poor quality resources. Findings further suggest that incorporating advanced probabilistic and text analysis methods to infer the reviewers' reliability and reviews' quality improves performance; however, there is still an evident need for instructor oversight and training of students to write compelling and reliable reviews.