Eye-Color and Type-2 Diabetes Phenotype Prediction From Genotype Data Using Deep Learning Methods

Muneeb, Muhammad; Henschel, Andreas

doi:10.21203/rs.3.rs-125397/v1

Cited by 1 publication

(2 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Represent controls with 0 and cases with 1, and update the CEU_5_1/CEU.sample and YRI_5_1/YRI.sample files accordingly with new phenotypes values after thresholding. Generate a separate phenotype file for both populations, which contains the sample id and the phenotype, and convert CEU_5_1/CEU.gen and YRI_5_1/YRI.gen files in 23andme file format, so the machine learning techniques specified in this article [35] are applicable to genotype-phenotype prediction.…”

Section: Convert Continuous Phenotype To Cases/controlsmentioning

confidence: 99%

“…Using 62,283 SNPs for training may overfit the model, so SNPs pre-selection process (p-value threshold or mutation difference between cases/controls at each SNP [35]) will reduce the dimensionality of input data leading to a generalized model. Generate multiple datasets using the SNPs pre-selection process on the training data for both populations.…”

Section: Snps Pre-selectionmentioning

confidence: 99%

See 1 more Smart Citation

Transfer learning for genotype–phenotype prediction using deep learning models

2022

Self Cite

View full text Add to dashboard Cite

Background For some understudied populations, genotype data is minimal for genotype-phenotype prediction. However, we can use the data of some other large populations to learn about the disease-causing SNPs and use that knowledge for the genotype-phenotype prediction of small populations. This manuscript illustrated that transfer learning is applicable for genotype data and genotype-phenotype prediction. Results Using HAPGEN2 and PhenotypeSimulator, we generated eight phenotypes for 500 cases/500 controls (CEU, large population) and 100 cases/100 controls (YRI, small populations). We considered 5 (4 phenotypes) and 10 (4 phenotypes) different risk SNPs for each phenotype to evaluate the proposed method. The improved accuracy with transfer learning for eight different phenotypes was between 2 and 14.2 percent. The two-tailed p-value between the classification accuracies for all phenotypes without transfer learning and with transfer learning was 0.0306 for five risk SNPs phenotypes and 0.0478 for ten risk SNPs phenotypes. Conclusion The proposed pipeline is used to transfer knowledge for the case/control classification of the small population. In addition, we argue that this method can also be used in the realm of endangered species and personalized medicine. If the large population data is extensive compared to small population data, expect transfer learning results to improve significantly. We show that Transfer learning is capable to create powerful models for genotype-phenotype predictions in large, well-studied populations and fine-tune these models to populations were data is sparse.

show abstract

Section: Convert Continuous Phenotype To Cases/controlsmentioning

confidence: 99%

Section: Snps Pre-selectionmentioning

confidence: 99%