Comprehensive and accurate detection of variants from whole-genome sequencing (WGS) is a strong prerequisite for translational genomic medicine; however, low concordance between analytic pipelines is an outstanding challenge. We processed a European and an African WGS samples with 70 analytic pipelines comprising the combination of 7 short-read aligners and 10 variant calling algorithms (VCAs), and observed remarkable differences in the number of variants called by different pipelines (max/min ratio: 1.3~3.4). The similarity between variant call sets was more closely determined by VCAs rather than by short-read aligners. Remarkably, reported minor allele frequency had a substantial effect on concordance between pipelines (concordance rate ratio: 0.11~0.92; Wald tests,
P
< 0.001), entailing more discordant results for rare and novel variants. We compared the performance of analytic pipelines and pipeline ensembles using gold-standard variant call sets and the catalog of variants from the 1000 Genomes Project. Notably, a single pipeline using BWA-MEM and GATK-HaplotypeCaller performed comparable to the pipeline ensembles for ‘callable’ regions (~97%) of the human reference genome. While a single pipeline is capable of analyzing common variants in most genomic regions, our findings demonstrated the limitations and challenges in analyzing rare or novel variants, especially for non-European genomes.
Autism spectrum disorder (ASD) represents a heterogeneous group of neurodevelopmental disorders and is largely attributable to genetic risk factors. Phenotypic and genetic heterogeneity of ASD have been well-recognized; however, genetic substrates for endophenotypes that constitute phenotypic heterogeneity are not yet known. In the present study, we compiled data from the Autism Genetic Resource Exchange, which contains the demographic and detailed phenotype information of 11,961 individuals. Notably, the whole-genome sequencing data available from MSSNG and iHART for 3833 individuals in this dataset was used to perform an endophenotype-wide association study. Using a linear mixed model, genome-wide association analyses were performed for 29 endophenotype scores and 0.58 million common variants with variant allele frequency ≥ 5%. We discovered significant associations between 9 genetic variants and 6 endophenotype scores comprising neurocognitive development and severity scores for core symptoms of ASD at a significance threshold of p < 5 × 10–7. Of note, the Stereotyped Behaviors and Restricted Interests total score in Autism Diagnostic Observation Schedule Module 3 was significantly associated with multiple variants in the VPS13B gene, a causal gene for Cohen syndrome and a candidate gene for syndromic ASD. Our findings yielded loci with small effect sizes due to the moderate sample size and, thus, require validation in another cohort. Nonetheless, our endophenotype-wide association analysis extends previous candidate gene discovery in the context of genotype and endophenotype association. As a result, these candidate genes may be responsible for specific traits that constitute core symptoms and neurocognitive function of ASD rather than the disorder itself.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.