2007
DOI: 10.1371/journal.pgen.0030160
|View full text |Cite
|
Sign up to set email alerts
|

PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations

Abstract: Existing methods to ascertain small sets of markers for the identification of human population structure require prior knowledge of individual ancestry. Based on Principal Components Analysis (PCA), and recent results in theoretical computer science, we present a novel algorithm that, applied on genomewide data, selects small subsets of SNPs (PCA-correlated SNPs) to reproduce the structure found by PCA on the complete dataset, without use of ancestry information. Evaluating our method on a previously described… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

5
347
1
2

Year Published

2008
2008
2020
2020

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 259 publications
(355 citation statements)
references
References 62 publications
5
347
1
2
Order By: Relevance
“…2A). Our method is a generalization of the approach of Paschou et al (35) and estimates genome-wide proportion of West African ancestry for a given individual as P = b/(a + b), where b and a are the chord distances from the European and West African centroids, respectively, for the given individual along PC1. Our generalization involves undertaking the PC1 distance analysis on a grid of points along the genome (as opposed to genome-wide) centered on 15 SNP windows and using a Hidden Markov Model (HMM) for inference of ancestry state (i.e., having "0," "1," or "2" chromosomes of recent African origin; see SI Text, Fig.…”
Section: Resultsmentioning
confidence: 99%
“…2A). Our method is a generalization of the approach of Paschou et al (35) and estimates genome-wide proportion of West African ancestry for a given individual as P = b/(a + b), where b and a are the chord distances from the European and West African centroids, respectively, for the given individual along PC1. Our generalization involves undertaking the PC1 distance analysis on a grid of points along the genome (as opposed to genome-wide) centered on 15 SNP windows and using a Hidden Markov Model (HMM) for inference of ancestry state (i.e., having "0," "1," or "2" chromosomes of recent African origin; see SI Text, Fig.…”
Section: Resultsmentioning
confidence: 99%
“…Although the question of the minimum number of markers required to detect the first n components is an interesting one, 6,34 we are more concerned with samples genotyped for GWA studies, and these are typically typed, at least initially, on one of the standard panels such as the Illumina HumanHap 300 or Affymetrix Mapping 500k marker sets. The PCA that we performed with a marker set formed from the intersection between the Illumina HumanHap 300 and Affymetrix Mapping 500k panels is therefore particularly interesting as it shows that a common panel is almost equally good at detecting the first few components as the full 129 673 SNP panel; therefore, at least for purposes of identifying sample origins, samples typed on either platform can be included in the same analyses.…”
Section: Discussionmentioning
confidence: 99%
“…However, some methods are tractable and capable to efficiently predict breed composition using breed frequencies of thousands of markers (Kuehn et al, 2011). Therefore, it is often desirable to reduce the number of markers according to their information content, in order to create reduced panels for population genetic analysis (Paschou et al, 2007). Many clustering algorithms have been developed employing population genetic data to assign individuals to clusters (Jakobsson and Rosenberg, 2007).…”
Section: Introductionmentioning
confidence: 99%