Polygenic scores (PGSs), which assess the genetic risk of individuals for a disease, are calculated as a weighted count of risk alleles identified in genome-wide association studies (GWASs). PGS methods differ in terms of which DNA variants are included in the score and the weights assigned to them. PGSs are evaluated in independent target samples of individuals with known disease status. Evaluation of new PGS methods are made using simulated data or single target cohort, however, in real data sets there can be heterogeneity between target sample cohorts, which could reflect a number of real or artefactual factors. The Psychiatric Genomics Consortium working groups for schizophrenia (SCZ) and major depressive disorder (MDD) bring together many independently collected case-control cohorts for GWAS meta-analysis. These resources are used here in repeated application of leave-one-cohort-out GWAS analyses, generating robust conclusions for PGS prediction applied across multiple target (left-out) cohorts. Eight PGS methods (P+T, SBLUP, LDpred-Inf, LDpred-funct, LDpred, PRS-CS, PRS-CS-auto, SBayesR) are compared. We found that SBayesR had the highest prediction evaluation statistics in most comparisons. For SCZ across 30 target cohorts, the SBayesR PGS achieved a mean area under the receiver operator characteristic curve (AUC) of 0.733, and explained 9.9% of variance on the liability scale. For MDD across 26 target cohorts, the AUC and variance explained were 0.601 and 4.0%, respectively. The variance explained by the SBayesR PGS was 46% and 43% higher for SCZ and MDD, respectively, compared to the basic p-value thresholding P+T method.