2021
DOI: 10.1101/2021.10.21.465343
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses

Abstract: Genetic and omics analyses frequently require independent observations, which is not guaranteed in real datasets. When relatedness can not be accounted for, solutions involve removing related individuals (or observations) and, consequently, a reduction of available data. We developed a network-based relatedness-pruning method that minimizes dataset reduction while removing unwanted relationships in a dataset. It uses node degree centrality metric to identify highly connected nodes (or individuals) and implemen… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 6 publications
0
4
0
Order By: Relevance
“…Based on these IBD probabilities, we calculated the pairwise kinship coefficient ( Φ ij ) as a function of IBD-sharing, Φ ij = 1/2 δ 2 ij + 1/4 δ 1 ij . We modeled the genetic relationships among individuals as networks 52 , in which pairs of individuals were linked if they had a Φij threshold ≥ 0.0884 ( i.e ., first- and second-degree relatives 53 ). Then, we excluded related individuals using the maximum clique graph approach to minimize sample loss 52 .…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Based on these IBD probabilities, we calculated the pairwise kinship coefficient ( Φ ij ) as a function of IBD-sharing, Φ ij = 1/2 δ 2 ij + 1/4 δ 1 ij . We modeled the genetic relationships among individuals as networks 52 , in which pairs of individuals were linked if they had a Φij threshold ≥ 0.0884 ( i.e ., first- and second-degree relatives 53 ). Then, we excluded related individuals using the maximum clique graph approach to minimize sample loss 52 .…”
Section: Methodsmentioning
confidence: 99%
“…We modeled the genetic relationships among individuals as networks 52 , in which pairs of individuals were linked if they had a Φij threshold ≥ 0.0884 ( i.e ., first- and second-degree relatives 53 ). Then, we excluded related individuals using the maximum clique graph approach to minimize sample loss 52 . We performed unsupervised principal components analysis 13 and unsupervised ADMIXTURE analysis 17 on the European reference data.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…First, we removed individuals with missing covariates and variants that are: (i) located in structural variants using TriTyper [33], (ii) duplicated, (iii) monomorphic, (iv) potential probe sites (defined as variants in which the probe may have variable affinity due to the presence of other SNPs within 20 bp and with MAF above 1%), or (v) have failed the Hardy-Weinberg Equilibrium (HWE) exact test (p < 10 -5 in controls). Next, we removed individuals with > 10% and variants with more than 5% of missing genetic data and inferred the relatedness between included individuals using KING [34]; those at greater than a second degree level (kinship coefficient > 0.0884) were removed using NAToRA [35] (Figure S2). Unlike the Le Guen et al pipeline, samples were not removed based on ancestry.…”
Section: Autosomal Qcmentioning
confidence: 99%