Polygenic scores (PGS) are individual-level measures that quantify the genetic contribution to a given trait. PGS have predominantly been developed using European ancestry samples and recent studies have shown that the predictive performance of European ancestry-derived PGS is lower in non-European ancestry samples, reflecting differences in linkage disequilibrium, variant frequencies, and variant effects across populations. However, the problem of how best to maximize performance within any one ancestry group given the data available, and the extent to which this varies between traits, are largely unexplored. Here, we investigate the effect of sample size and ancestry composition on the predictive performance of PGS for fifteen traits in UK Biobank and evaluate an importance reweighting approach that aims to counteract the under-representation of certain groups within training data. We find that, for a minority of the traits, PGS estimated using a relatively small number of Black/Black British individuals outperformed, on a Black/Black British test set, scores estimated using a much larger number of White individuals. For example, a PGS for mean corpuscular volume trained on only Black individuals achieved a 4-fold improvement on a corresponding PGS trained on only White individuals. For the remainder of the traits, the reverse was true; a PGS for height trained on only Black/Black British individuals explained less than 0.5% of the variance in height in a Black/Black British test set, compared to 3.9% for a PGS trained on a much larger training set consisting of only White individuals. We find that while importance weighting provides moderate benefit for some traits (for example, 40% improvement for mean corpuscular volume compared to no reweighting), the improvement is modest in most cases, arguing that only targeted collection of data from underrepresented groups can address differences in PGS performance.