2020
DOI: 10.1101/2020.04.21.053876
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

XGMix: Local-Ancestry Inference with Stacked XGBoost

Abstract: Genomic medicine promises increased resolution for accurate diagnosis, for personalized treatment, and for identification of population-wide health burdens at rapidly decreasing cost (with a genotype now cheaper than an MRI and dropping). The benefits of this emerging form of affordable, data-driven medicine will accrue predominantly to those populations whose genetic associations have been mapped, so it is of increasing concern that over 80% of such genome-wide association studies (GWAS) have been conducted s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 20 publications
0
8
0
Order By: Relevance
“…Other purely discriminative approaches have recently been developed. Kumar et al (2020) have described an approach that employs boosted gradient trees to perform local ancestry inference much faster and with fewer computational resources than existing methods, while maintaining comparable accuracy. A similar method, using neural networks has also been described recently (Montserrat et al, 2020).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Other purely discriminative approaches have recently been developed. Kumar et al (2020) have described an approach that employs boosted gradient trees to perform local ancestry inference much faster and with fewer computational resources than existing methods, while maintaining comparable accuracy. A similar method, using neural networks has also been described recently (Montserrat et al, 2020).…”
Section: Discussionmentioning
confidence: 99%
“…Instead, they attempt to learn directly from segments of known ancestry the conditional distribution of ancestries given haplotype data. Discrimina-tive models make fewer assumptions about the demographic process underlying admixture and typically scale better to large datasets (Omberg et al, 2012; Kumar et al, 2020). A number of discriminative approaches have been described (Brisbin et al, 2012; Omberg et al, 2012; Maples et al, 2013; Kumar et al, 2020; Montserrat et al, 2020).…”
Section: Introductionmentioning
confidence: 99%
“…Genome sequences are composed of four nucleotides, typically represented with the letters: A, T, C and G. While the majority of genomic positions are fixed across individuals of the same species, a small fraction is known to be variable. Most of these positions are single-nucleotide polymorphisms (SNPs) that have two variants or forms, which allows for a binary encoding with a common or majority variant (encoded as a zero) shared among the majority of individuals and a minority or alternative variant (encoded as a one) (Avallone et al, 2020;Ioannidis et al, 2020;Kumar et al, 2020;Maples et al, 2013;Thornton and Bermejo, 2014).…”
Section: Genomic Data and Its Applicationsmentioning
confidence: 99%
“…Chm-22 and Chm-1 include the same set of individuals, but with only the subset of their genome sequence encoded on chromosome 22 and chromosome 1, respectively, considered. Chm-22-SIM is an augmented version of the Chm-22 data: it contains simulated descendants of the real individuals, created using a July 6, 2021 8/21 recombination simulation program, PyAdmix [23] with the simulations performed independently on the train and validation partitions of Chm-22. A total of 400 individuals per ancestry are generated in the training set and 50 in the validation set.…”
Section: Experiments Datasetsmentioning
confidence: 99%