Eosinophilic esophagitis (EoE) is a chronic inflammatory disorder associated with allergic hypersensitivity to food. We interrogated >1.5 million genetic variants in European EoE cases and subsequently in a multi-site cohort with local and out-of-study control subjects. In addition to replication of the 5q22 locus (meta-analysis p = 1.9×10−16), we identified association at 2p23 (encoding CAPN14, p = 2.5×10−10). CAPN14 was specifically expressed in the esophagus, dynamically upregulated as a function of disease activity and genetic haplotype and after exposure of epithelial cells to IL-13, and located in an epigenetic hotspot modified by IL-13. There was enriched esophageal expression for the genes neighboring the top 208 EoE sequence variants. Multiple allergic sensitization loci were associated with EoE susceptibility (4.8×10−2 < p < 5.1×10−11). We propose a model that elucidates the tissue specific nature of EoE that involves the interplay of allergic sensitization with an EoE-specific, IL-13–inducible esophageal response involving CAPN14.
Next Generation Sequencing studies generate a large quantity of genetic data in a relatively cost and time efficient manner and provide an unprecedented opportunity to identify candidate causative variants that lead to disease phenotypes. A challenge to these studies is the generation of sequencing artifacts by current technologies. To identify and characterize the properties that distinguish false positive variants from true variants, we sequenced a child and both parents (one trio) using DNA isolated from three sources (blood, buccal cells, and saliva). The trio strategy allowed us to identify variants in the proband that could not have been inherited from the parents (Mendelian errors) and would most likely indicate sequencing artifacts. Quality control measurements were examined and three measurements were found to identify the greatest number of Mendelian errors. These included read depth, genotype quality score, and alternate allele ratio. Filtering the variants on these measurements removed ~95% of the Mendelian errors while retaining 80% of the called variants. These filters were applied independently. After filtering, the concordance between identical samples isolated from different sources was 99.99% as compared to 87% before filtering. This high concordance suggests that different sources of DNA can be used in trio studies without affecting the ability to identify causative polymorphisms. To facilitate analysis of next generation sequencing data, we developed the Cincinnati Analytical Suite for Sequencing Informatics (CASSI) to store sequencing files, metadata (eg. relatedness information), file versioning, data filtering, variant annotation, and identify candidate causative polymorphisms that follow either de novo, rare recessive homozygous or compound heterozygous inheritance models. We conclude the data cleaning process improves the signal to noise ratio in terms of variants and facilitates the identification of candidate disease causative polymorphisms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.