Recent years have seen a boom in the application of the next‐generation sequencing technology to the study of human disorders, including Autism Spectrum Disorder (ASD), where the focus has been on identifying rare, possibly causative genomic variants in ASD individuals. Because of the high genetic heterogeneity of ASD, a large number of subjects is needed to establish evidence for a variant or gene ASD‐association, thus aggregating data across cohorts and studies is necessary. However, methodological inconsistencies and subject overlap across studies complicate data aggregation. Here we present VariCarta, a web‐based database developed to address these challenges by collecting, reconciling, and consistently cataloging literature‐derived genomic variants found in ASD subjects using ongoing semi‐manual curation. The careful manual curation combined with a robust data import pipeline rectifies errors, converts variants into a standardized format, identifies and harmonizes cohort overlaps, and documents data provenance. The harmonization aspect is especially important since it prevents the potential double counting of variants, which can lead to inflation of gene‐based evidence for ASD‐association. The database currently contains 170,416 variant events from 10,893 subjects, collected across 61 publications, and reconciles 16,202 variants that have been reported in literature multiple times. VariCarta is freely accessible at http://varicarta.msl.ubc.ca. Autism Res 2019, 12: 1728–1736. © 2019 International Society for Autism Research, Wiley Periodicals, Inc.
Lay Summary
The search for genetic factors underlying Autism Spectrum Disorder (ASD) yielded numerous studies reporting potentially causative genomic variants found in ASD individuals. However, methodological differences and subject overlap across studies complicate the assembly of these data, diminishing its utility and accessibility. We developed VariCarta, a web‐based database that aggregates carefully curated, annotated, and harmonized literature‐derived variants identified in individuals with ASD using ongoing semi‐manual curation.