Background: Allele-specific (AS) Polymerase Chain Reaction is a convenient and inexpensive method for genotyping Single Nucleotide Polymorphisms (SNPs) and mutations. It is applied in many recent studies including population genetics, molecular genetics and pharmacogenomics. Using known AS primer design tools to create primers leads to cumbersome process to inexperience users since information about SNP/mutation must be acquired from public databases prior to the design. Furthermore, most of these tools do not offer the mismatch enhancement to designed primers. The available web applications do not provide user-friendly graphical input interface and intuitive visualization of their primer results.
BackgroundNon-random patterns of genetic variation exist among individuals in a population owing to a variety of evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult to correctly estimate the number of subpopulations and assign individuals to them. The computationally efficient non-parametric, chiefly Principal Components Analysis (PCA)-based methods are thus becoming increasingly relied upon for population structure analysis. Current PCA-based methods can accurately detect structure; however, the accuracy in resolving subpopulations and assigning individuals to them is wanting. When subpopulations are closely related to one another, they overlap in PCA space and appear as a conglomerate. This problem is exacerbated when some subpopulations in the dataset are genetically far removed from others. We propose a novel PCA-based framework which addresses this shortcoming.ResultsA novel population structure analysis algorithm called iterative pruning PCA (ipPCA) was developed which assigns individuals to subpopulations and infers the total number of subpopulations present. Genotypic data from simulated and real population datasets with different degrees of structure were analyzed. For datasets with simple structures, the subpopulation assignments of individuals made by ipPCA were largely consistent with the STRUCTURE, BAPS and AWclust algorithms. On the other hand, highly structured populations containing many closely related subpopulations could be accurately resolved only by ipPCA, and not by other methods.ConclusionThe algorithm is computationally efficient and not constrained by the dataset complexity. This systematic subpopulation assignment approach removes the need for prior population labels, which could be advantageous when cryptic stratification is encountered in datasets containing individuals otherwise assumed to belong to a homogenous population.
To maximize the potential of genomics in medicine, it is essential to establish databases of genomic variants for ethno‐geographic groups that can be used for filtering and prioritizing candidate pathogenic variants. Populations with non‐European ancestry are poorly represented among current genomic variant databases. Here, we report the first high‐density survey of genomic variants for the Thai population, the Thai Reference Exome (T‐REx) variant database. T‐REx comprises exome sequencing data of 1092 unrelated Thai individuals. The targeted exome regions common among four capture platforms cover 30.04 Mbp on autosomes and chromosome X. 345 681 short variants (18.27% of which are novel) and 34 907 copy number variations were found. Principal component analysis on 38 469 single nucleotide variants present worldwide showed that the Thai population is most genetically similar to East and Southeast Asian populations. Moreover, unsupervised clustering revealed six Thai subpopulations consistent with the evidence of gene flow from neighboring populations. The prevalence of common pathogenic variants in T‐REx was investigated in detail, which revealed subpopulation‐specific patterns, in particular variants associated with erythrocyte disorders such as the HbE variant in HBB and the Viangchan variant in G6PD. T‐REx serves as a pivotal addition to the current databases for genomic medicine.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.