Background: Very low coverage (0.1 to 1x) whole genome sequencing (WGS) has become a promising and affordable approach to discover genomic variants of human populations for Genome-Wide Association Study (GWAS). To support genetic screening using Preimplantation Genetic Testing (PGT) in a large population, the sequencing coverage goes below 0.1x to an ultra-low level. However, its feasibility and effectiveness for GWAS remains undetermined.
Methods: We devised a pipeline to process ultra-low coverage WGS data and benchmarked the accuracy of genotype imputation at the combination of different coverages below 0.1x and sample sizes from 2,000 to 16,000, using 17,844 embryo PGT with approximately 0.04x average coverage and the standard Chinese sample HG005 with known genotypes. We then applied the imputed genotypes of 1,744 transferred embryos who have gestational ages and complete follow-up records to GWAS.
Results: The accuracy of genotype imputation under ultra-low coverage can be improved by increasing the sample size and applying a set of filters. From 1,744 born embryos, we identified 11 genomic risk loci associated with gestational ages and 166 genes mapped to these loci according to positional, expression quantitative trait locus and chromatin interaction strategies. Among these mapped genes, CRHBP, ICAM1 and OXTR were more frequently reported as preterm birth related. By joint analysis of gene expression data from previous studies, we constructed interrelationships of mainly CRHBP, ICAM1, PLAGL1, DNMT1, CNTLN, DKK1 and EGR2 with preterm birth, infant disease and breast cancer.
Conclusions: This study not only demonstrates that ultra-low coverage WGS could achieve relatively high accuracy of adequate genotype imputation and is capable of GWAS, but also provides insights into uncovering genetic associations of gestational age trait existed in the fetal embryo samples from Chinese or Eastern Asian populations.