As part of a broader collaborative network of exome sequencing studies, we developed a jointly called data set of 5,685 Ashkenazi Jewish exomes. We make publicly available a resource of site and allele frequencies, which should serve as a reference for medical genetics in the Ashkenazim (hosted in part at https://ibd.broadinstitute.org, also available in gnomAD at http://gnomad.broadinstitute.org). We estimate that 34% of protein-coding alleles present in the Ashkenazi Jewish population at frequencies greater than 0.2% are significantly more frequent (mean 15-fold) than their maximum frequency observed in other reference populations. Arising via a well-described founder effect approximately 30 generations ago, this catalog of enriched alleles can contribute to differences in genetic risk and overall prevalence of diseases between populations. As validation we document 148 AJ enriched protein-altering alleles that overlap with "pathogenic" ClinVar alleles (table available at https://github.com/macarthur-lab/clinvar/blob/master/output/clinvar.tsv), including those that account for 10–100 fold differences in prevalence between AJ and non-AJ populations of some rare diseases, especially recessive conditions, including Gaucher disease (GBA, p.Asn409Ser, 8-fold enrichment); Canavan disease (ASPA, p.Glu285Ala, 12-fold enrichment); and Tay-Sachs disease (HEXA, c.1421+1G>C, 27-fold enrichment; p.Tyr427IlefsTer5, 12-fold enrichment). We next sought to use this catalog, of well-established relevance to Mendelian disease, to explore Crohn's disease, a common disease with an estimated two to four-fold excess prevalence in AJ. We specifically attempt to evaluate whether strong acting rare alleles, particularly protein-truncating or otherwise large effect-size alleles, enriched by the same founder-effect, contribute excess genetic risk to Crohn's disease in AJ, and find that ten rare genetic risk factors in NOD2 and LRRK2 are enriched in AJ (p < 0.005), including several novel contributing alleles, show evidence of association to CD. Independently, we find that genomewide common variant risk defined by GWAS shows a strong difference between AJ and non-AJ European control population samples (0.97 s.d. higher, p<10−16). Taken together, the results suggest coordinated selection in AJ population for higher CD risk alleles in general. The results and approach illustrate the value of exome sequencing data in case-control studies along with reference data sets like ExAC (sites VCF available via FTP at ftp.broadinstitute.org/pub/ExAC_release/release0.3/) to pinpoint genetic variation that contributes to variable disease predisposition across populations.
BackgroundLarge databases permit quantitative description of genes in terms of intolerance to loss of function (‘haploinsufficiency’) and prevalence of missense variants. We explored these parameters in inherited retinal disease (IRD) genes.MethodsIRD genes (from the ‘RetNet’ resource) were classified by probability of loss of function intolerance (pLI) using online Genome Aggregation Database (gnomAD) and DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources (DECIPHER) databases. Genes were identified having pLI ≥0.9 together with one or both of the following: upper bound of CI <0.35 for observed to expected (o/e) ratio of loss of function variants in the gnomAD resource; haploinsufficiency score <10 in the DECIPHER resource. IRD genes in which missense variants appeared under-represented or over-represented (Z score for o/e ratio of <−2.99 or >2.99, respectively) were also identified. The genes were evaluated in the gene ontology Protein Analysis THrough Evolutionary Relationships (PANTHER) resource.ResultsOf 280 analysed genes, 39 (13.9%) were predicted loss of function intolerant. A greater proportion of X-linked than autosomal IRD genes fulfilled these criteria, as expected. Most autosomal genes were associated with dominant disease. PANTHER analysis showed >100 fold enrichment of spliceosome tri-snRNP complex assembly. Most encoded proteins were longer than the median length in the UniProt database. Fourteen genes (11 of which were in the ‘haploinsufficient’ group) showed under-representation of missense variants. Six genes (SAMD11, ALMS1, WFS1, RP1L1, KCNV2, ADAMTS18) showed over-representation of missense variants.ConclusionA minority of IRD-associated genes appear to be ‘haploinsufficient’. Over-representation of spliceosome pathways was observed. When interpreting genetic tests, variants found in genes with over-representation of missense variants should be interpreted with caution.
As part of a broader collaborative network of exome sequencing studies, we developed a jointly called data set of 5,685 Ashkenazi Jewish exomes. We make publicly available a resource of site and allele frequencies, which should serve as a reference for medical genetics in the Ashkenazim. We estimate that 30% of proteincoding alleles present in the Ashkenazi Jewish population at frequencies greater than 0.2% are significantly more frequent (mean 7.6fold) than their maximum frequency observed in other reference populations. Arising via a welldescribed founder effect, this catalog of enriched alleles can contribute to differences in genetic risk and overall prevalence of diseases between populations. As validation we document 151 AJ enriched proteinaltering alleles that overlap with "pathogenic
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.