Here we provide 129 complete mitochondrial control region sequences of indigenous KhoeSan individuals from Angola to contribute to the still underrepresented pool of data from Africa. The dataset consists of exclusively African lineages with a majority of Sub-Saharan haplogroups. The probability of a random match was calculated as 0.09. The data set comprises 21 haplotypes occurring more than once and 17 unique haplotypes. Upon publication, haplotypes were incorporated in the EMPOP database (www.empop.org; Only few Angolan mtDNA sequences were available in the literature, which were typed for the first and second hypervariable segment of the control region [11][12][13]. The blood samples for this study came from 129 healthy randomly drawn volunteer donors of to our knowledge unrelated individuals collected in the Schmidtsdrift community. Individuals identified themselves as !Xu (113) and Khwe (13). Ethnicities of three individuals remained unknown.Informed written/oral consent was obtained from all human subjects. Ethic approval was obtained by the Hans Snykers Institute for the collection of samples for both biochemical and genetic studies on the Bushmen from the Faculty of Medicine, University of Pretoria Ethics Committee.
DNA extraction, amplification and sequencingGenomic DNA was extracted from peripheral blood as described in [14]. Full mitochondrial control region (CR) was amplified, sequenced and interpreted as reported in [15].
3
Data analysisAccording to our in-house data quality management process, the resulting consensus sequences were inspected independently by two different analysts using the sequence analysis software Sequencher (Version 4.8) and reviewed by a third scientist. Consensus sequences covered a common reading frame from position 16024 to 576 and were reported as differences to the rCRS [16] following updated nomenclature guidelines for mtDNA [17].Haplogroups were assigned according to Phylotree, build 12 [18]. Within our Khoe-San sample set the random match probability was calculated as the sum of squared CR frequencies, disregarding length variants at positions 16193, 309 and 573. The haplotypes from this study will be available on the EMPOP database (www.empop.org) upon publication (EMPOP accession number EMP00069).
Results
Observed CR haplotypes and diversity indicesThe entire control region analysis revealed exclusively typical Africa-specific L lineages with a majority of Sub-Saharan origin (Table S1). Within our dataset, we found 90.7% L0 haplotypes, hence it is remarkable that 97.4% of all L0 haplotypes (or 88.4% of the observed Khoe-San dataset) belonged to the Khoe-San specific L0d/L0k cluster which is reported to account for ~ 60% in observed Khoe-San populations [19]. Among these, the most frequent haplogroups within the Angolan Khoe-San samples were L0d1c1 (33.3%) as well as L0k1 (27.1%). However, we identified a minor contribution of 9.3% of haplogroups L2 and L3 within the Khoe-San. The absence of lineages L1, L5, L6, and L4 in our dataset corresponds to their docu...