Predicting phenotypes from DNA has recently become extensively studied field in forensic research and is referred to as Forensic DNA Phenotyping. Systems based on single nucleotide polymorphisms for accurate prediction of iris, hair and skin color in global population, independent of bio-geographical ancestry, have recently been introduced. Here, we analyzed 14 SNPs for distinct skin pigmentation traits in a homogeneous cohort of 222 Polish subjects. We compared three different algorithms: General Linear Model based on logistic regression, Random Forest and Neural Network in 18 developed prediction models. We demonstrate Random Forest to be the most accurate algorithm for 3- and 4-category estimations (total of 58.3% correct calls for skin color prediction, 47.2% for tanning prediction, 50% for freckling prediction). Binomial Logistic Regression was the best approach in 2-category estimations (total of 69.4% correct calls, AUC = 0.673 for tanning prediction; total of 52.8% correct calls, AUC = 0.537 for freckling prediction). Our study confirms the association of rs12913832 (
HERC2
) with all three skin pigmentation traits, but also variants associated solely with certain pigmentation traits, namely rs6058017 and rs4911414 (
ASIP
) with skin sensitivity to sun and tanning abilities, rs12203592 (
IRF4
) with freckling and rs4778241 and rs4778138 (
OCA2
) with skin color and tanning. Finally, we assessed significant differences in allele frequencies in comparison with CEU data and our study provides a starting point for the development of prediction models for homogeneous populations with less internal differentiation than in the global predictive testing.
Electronic supplementary material
The online version of this article (10.1007/s00439-019-02012-w) contains supplementary material, which is available to authorized users.