An empirical comparison between polygenic risk scores and machine learning for case/control classification

Muneeb, Muhammad; Feng, Samuel F.; Henschel, Andreas

doi:10.21203/rs.3.rs-1298372/v1

Cited by 4 publications

(4 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the linear polygenic models do not have the sufficient expressive capacity to learn and transfer complex representations across subpopulations with different genetic architectures. Recent studies indicate that the deep learning models capable of capturing complex nonlinear interactions generally outperform the linear disease prediction models (83)(84)(85).…”

Section: P(yx) = P(y|x) • P(x)mentioning

confidence: 99%

Addressing the Challenge of Biomedical Data Inequality: An Artificial Intelligence Perspective

Gao

Sharma

Cui

2023

Annu. Rev. Biomed. Data Sci.

View full text Add to dashboard Cite

Artificial intelligence (AI) and other data-driven technologies hold great promise to transform healthcare and confer the predictive power essential to precision medicine. However, the existing biomedical data, which are a vital resource and foundation for developing medical AI models, do not reflect the diversity of the human population. The low representation in biomedical data has become a significant health risk for non-European populations, and the growing application of AI opens a new pathway for this health risk to manifest and amplify. Here we review the current status of biomedical data inequality and present a conceptual framework for understanding its impacts on machine learning. We also discuss the recent advances in algorithmic interventions for mitigating health disparities arising from biomedical data inequality. Finally, we briefly discuss the newly identified disparity in data quality among ethnic groups and its potential impacts on machine learning. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 6 is August 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

show abstract

Section: P(yx) = P(y|x) • P(x)mentioning

confidence: 99%

Addressing the Challenge of Biomedical Data Inequality: An Artificial Intelligence Perspective

Gao

Sharma

Cui

2023

Annu. Rev. Biomed. Data Sci.

View full text Add to dashboard Cite

show abstract

“…These algorithms have already been used for genotype-phenotype [48][49][50] prediction making them applicate for genotype-phenotype transfer learning.…”

Section: Recurrent Neural Networkmentioning

confidence: 99%

Transfer learning for genotype–phenotype prediction using deep learning models

2022

Self Cite

View full text Add to dashboard Cite

Background For some understudied populations, genotype data is minimal for genotype-phenotype prediction. However, we can use the data of some other large populations to learn about the disease-causing SNPs and use that knowledge for the genotype-phenotype prediction of small populations. This manuscript illustrated that transfer learning is applicable for genotype data and genotype-phenotype prediction. Results Using HAPGEN2 and PhenotypeSimulator, we generated eight phenotypes for 500 cases/500 controls (CEU, large population) and 100 cases/100 controls (YRI, small populations). We considered 5 (4 phenotypes) and 10 (4 phenotypes) different risk SNPs for each phenotype to evaluate the proposed method. The improved accuracy with transfer learning for eight different phenotypes was between 2 and 14.2 percent. The two-tailed p-value between the classification accuracies for all phenotypes without transfer learning and with transfer learning was 0.0306 for five risk SNPs phenotypes and 0.0478 for ten risk SNPs phenotypes. Conclusion The proposed pipeline is used to transfer knowledge for the case/control classification of the small population. In addition, we argue that this method can also be used in the realm of endangered species and personalized medicine. If the large population data is extensive compared to small population data, expect transfer learning results to improve significantly. We show that Transfer learning is capable to create powerful models for genotype-phenotype predictions in large, well-studied populations and fine-tune these models to populations were data is sparse.

show abstract

“…Such tools usually adapt complicated machine learning models that may consider nonlinear analysis as well as causality assumptions or inference (Muneeb et al, 2022;Meijering and Gianola, 1985;Sailer and Harms, 2017;Bao et al, 2020;Basu et al, 2018;Lee et al, 2016).…”

Section: Introductionmentioning

confidence: 99%

“…These data exposed the challenges of handling correlations between in-between-ome terms (e.g., co-expressions in the transcriptome), however they also provide opportunities (Wainberg et al ., 2019) These data have triggered development of sophisticated tools leveraging in-between-omes to characterize the genetic basis of complex traits. Such tools usually adapt complicated machine learning models that may consider nonlinear analysis as well as causality assumptions or inference (Muneeb et al ., 2022; Meijering and Gianola, 1985; Sailer and Harms, 2017; Bao et al ., 2020; Basu et al ., 2018; Lee et al ., 2016). However, there are no standard simulators to benchmark the performance of the newly developed tools, leaving authors to develop different ad hoc simulations tailoring to their works.…”

Section: Introductionmentioning

confidence: 99%

OmeSim: a genetics-based nonlinear simulator for in-between-ome and phenotype

Long,

Zhang

2024

Preprint

View full text Add to dashboard Cite

MotivationDeciphering genetic basis of complex traits via genotype-phenotype association studies is a long-standing theme in genetics. The availability of molecular omics data (such as transcriptome) has enabled researchers to utilize “in-between-omes” in association studies, for instance transcriptome-wide association study. Although many statistical tests and machine learning models integrating omics in genetic mapping are emerging, there is no standard way to simulate phenotype by genotype with the role of in-between-omes incorporated. Moreover, the involvement of in-between-omes usually bring substantial nonlinear architecture (e.g., co-expression network), that may be non-trivial to simulate. As such, rigorous power estimations, a critical step to test novel models, may not be conducted fairly.ResultsTo address the gap between emerging methods development and the unavailability of adequate simulators, we developed OmeSim, a phenotype simulator incorporating genetics, an in-between-ome (e.g., transcriptome), and their complex relationships including nonlinear architectures. OmeSim outputs detailed causality graphs together with original data, correlations, and associations structures between phenotypic traits and omes terms as comprehensive gold-standard datasets for the verifications of novel tools integrating an in-between-ome in genotype-phenotype association studies. We expect OmeSim to enable rigorous benchmarking for the future multi-omics integrations.Availabilityhttps://github.com/zhoulongcoding/OmeSimContactqingrun.zhang@ucalgary.ca

show abstract

An empirical comparison between polygenic risk scores and machine learning for case/control classification

Cited by 4 publications

References 14 publications

Addressing the Challenge of Biomedical Data Inequality: An Artificial Intelligence Perspective

Addressing the Challenge of Biomedical Data Inequality: An Artificial Intelligence Perspective

Transfer learning for genotype–phenotype prediction using deep learning models

OmeSim: a genetics-based nonlinear simulator for in-between-ome and phenotype

Contact Info

Product

Resources

About