Recent reports have identified differences in the mutational spectra across human 13 populations. While some of these reports have been replicated in other cohorts, most have been 14 reported only in the 1000 Genomes Project (1kGP) data. While investigating an intriguing putative 15 population stratification within the Japanese population, we identified a previously unreported 16 batch effect leading to spurious mutation calls in the 1kGP data and to the apparent population 17 stratification. Because the 1kGP data is used extensively, we find that the batch effects also lead to 18 incorrect imputation by leading imputation servers and suspicious GWAS associations. 19 Lower-quality data from the early phases of the 1kGP thus continues to contaminate modern 20 studies in hidden ways. It may be time to retire or upgrade such legacy sequencing data. 21 Cohorts, Imputation 23 24 29 Consortium et al., 2015), and the Simons Diversity project (Mallick et al., 2016), for example, have 30 made thousands of genome sequences publicly available for population and medical genetic analy-31 ses. Many more genomes are available indirectly through servers providing imputation services 32 (McCarthy et al., 2016) or summary statistics for variant frequency estimation (Lek et al., 2016). 33 The first genomes in the 1kGP were sequenced 10 years ago (van Dijk et al., 2014). Since 34 then, sequencing platforms have rapidly improved. The second phase of the 1kGP implemented 35 multiple technological and analytical improvements over its earlier phases (1000 Genomes Project 36 Consortium, 2012; Consortium et al., 2015), leading to heterogeneous sample preparations and 37 data quality over the course of the project. 38 Yet, because of the extraordinary value of freely available data, early data from the 1kGP is still 39 widely used to impute untyped variants, to estimate allele frequencies, and to answer a wide range 40 1 of 24 of medical and evolutionary questions. This raises the question of whether and how such legacy 41 data should be included in contemporary analyses alongside more recent cohorts. Here we point 42 out how large and previously unreported batch effects in the early phases of the 1kGP still lead to 43 incorrect genetic conclusions through population genetic analyses and spurious GWAS associations 44 as a result of imputation using the 1kGP as a reference. 45