2022
DOI: 10.1016/j.csbj.2022.04.009
|View full text |Cite
|
Sign up to set email alerts
|

NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 14 publications
0
7
0
Order By: Relevance
“…First, we removed individuals with missing covariates and variants that are: (i) located in structural variants using TriTyper [33], (ii) duplicated, (iii) monomorphic, (iv) potential probe sites (defined as variants in which the probe may have variable affinity due to the presence of other SNPs within 20 bp and with MAF above 1%), or (v) have failed the Hardy-Weinberg Equilibrium (HWE) exact test ( p < 10 −5 in controls). Next, we removed individuals with > 10% and variants with more than 5% of missing genetic data and inferred the relatedness between included individuals using KING [34]; those at greater than a second degree level (kinship coefficient > 0.0884) were removed using NAToRA [35] ( Figure S2 ). Unlike the Le Guen et al pipeline, samples were not removed based on ancestry.…”
Section: Methodsmentioning
confidence: 99%
“…First, we removed individuals with missing covariates and variants that are: (i) located in structural variants using TriTyper [33], (ii) duplicated, (iii) monomorphic, (iv) potential probe sites (defined as variants in which the probe may have variable affinity due to the presence of other SNPs within 20 bp and with MAF above 1%), or (v) have failed the Hardy-Weinberg Equilibrium (HWE) exact test ( p < 10 −5 in controls). Next, we removed individuals with > 10% and variants with more than 5% of missing genetic data and inferred the relatedness between included individuals using KING [34]; those at greater than a second degree level (kinship coefficient > 0.0884) were removed using NAToRA [35] ( Figure S2 ). Unlike the Le Guen et al pipeline, samples were not removed based on ancestry.…”
Section: Methodsmentioning
confidence: 99%
“…Only two related BL cases, previously reported in Uganda, 34 were identified. We used country‐specific PCs and a genetic relationship matrix (GRM), based on the probability that two individuals i and j share 0, 1, or 2 alleles identical by descent (IBD), 35 to adjust for ancestry. We fit a generalized linear mixed model, controlling for sex, age, P. falciparum detection, country, and ancestry as fixed effects, and GRM as a random variable.…”
Section: Methodsmentioning
confidence: 99%
“…To create the subset, we selected self-reported Latin American individuals included in the Genetics of Latin American Diversity (GLAD) project [21] , as well as individuals from the Women's Health Initiative, the Jackson Heart Study, the 1KGP, the Human Genome Diversity Project, the Framingham Heart Study, the Barbados Asthma Genetics Study, and the MultiEthnic Study of Atherosclerosis (MESA) [11,[22][23][24][25][26][27] . We calculated the genetic relationship using KING [28] and excluded related individuals using NAToRA [29] . In this work, we considered individuals with third-degree (kinship coefficient > 0.0442) or higher as related.…”
Section: Data Source For Reference Panelsmentioning
confidence: 99%
“…For CUSCH-LOAD, we also created two additional plink files containing only individuals from Puerto Rico (PR) or the Dominican Republic (DR). We then ran relatedness via KING [28] to get kinship coefficients for each pair of individuals in each target population, and individuals with a third degree or closer relationship were removed using NAToRA [29] .…”
Section: Quality Control For Target Populationsmentioning
confidence: 99%