Efficient tools for data management and integration are essential for many aspects of high-throughput biology. In particular, annotations of genes and human genetic variants are commonly used but highly fragmented across many resources. Here, we describe MyGene.info and MyVariant.info, high-performance web services for querying gene and variant annotation information. These web services are currently accessed more than three million times permonth. They also demonstrate a generalizable cloud-based model for organizing and querying biological annotation information. MyGene.info and MyVariant.info are provided as high-performance web services, accessible at http://mygene.info and http://myvariant.info. Both are offered free of charge to the research community.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-016-0953-9) contains supplementary material, which is available to authorized users.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) enters human host cells via angiotensin-converting enzyme 2 (ACE2) and causes coronavirus disease 2019 (COVID-19). Here, through a genome-wide association study, we identify a variant (rs190509934, minor allele frequency 0.2–2%) that downregulates ACE2 expression by 37% (P = 2.7 × 10−8) and reduces the risk of SARS-CoV-2 infection by 40% (odds ratio = 0.60, P = 4.5 × 10−13), providing human genetic evidence that ACE2 expression levels influence COVID-19 risk. We also replicate the associations of six previously reported risk variants, of which four were further associated with worse outcomes in individuals infected with the virus (in/near LZTFL1, MHC, DPP9 and IFNAR2). Lastly, we show that common variants define a risk score that is strongly associated with severe disease among cases and modestly improves the prediction of disease severity relative to demographic and clinical factors alone.
Summary
To meet the increased need of making biomedical resources more accessible and reusable, Web APIs or web services have become a common way to disseminate knowledge sources. The BioThings APIs are a collection of high-performance, scalable, annotation as a service APIs that automate the integration of biological annotations from disparate data sources. This collection of APIs currently includes MyGene.info, MyVariant.info, and MyChem.info for integrating annotations on genes, variants, and chemical compounds, respectively. These APIs are used by both individual researchers and application developers to simplify the process of annotation retrieval and identifier mapping. Here, we describe the BioThings Software Development Kit (SDK), a generalizable and reusable toolkit for integrating data from multiple disparate data sources and creating high-performance APIs. This toolkit allows users to easily create their own BioThings APIs for any data type of interest to them, as well as keep APIs up-to-date with their underlying data sources.
Availability and implementation
The BioThings SDK is built in Python and released via PyPI (https://pypi.org/project/biothings/). Its source code is hosted at its github repository (https://github.com/biothings/biothings.api).
Supplementary information
Supplementary data are available at Bioinformatics online.
BackgroundApplication Programming Interfaces (APIs) are now widely used to distribute biological data. And many popular biological APIs developed by many different research teams have adopted Javascript Object Notation (JSON) as their primary data format. While usage of a common data format offers significant advantages, that alone is not sufficient for rich integrative queries across APIs.ResultsHere, we have implemented JSON for Linking Data (JSON-LD) technology on the BioThings APIs that we have developed, MyGene.info, MyVariant.info and MyChem.info. JSON-LD provides a standard way to add semantic context to the existing JSON data structure, for the purpose of enhancing the interoperability between APIs. We demonstrated several use cases that were facilitated by semantic annotations using JSON-LD, including simpler and more precise query capabilities as well as API cross-linking.ConclusionsWe believe that this pattern offers a generalizable solution for interoperability of APIs in the life sciences.Electronic supplementary materialThe online version of this article (10.1186/s12859-018-2041-5) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.