Profiling immunoglobulin (Ig) receptor repertoires with specialized assays can be costineffective and time-consuming. Here we report ImReP, a computational method for rapid and accurate profiling of the Ig repertoire, including the complementary-determining region 3 (CDR3), using regular RNA sequencing data such as those from 8,555 samples across 53 tissues types from 544 individuals in the Genotype-Tissue Expression (GTEx v6) project. Using ImReP and GTEx v6 data, we generate a collection of 3.6 million Ig sequences, termed the atlas of immunoglobulin repertoires (TAIR), across a broad range of tissue types that often do not have reported Ig repertoires information. Moreover, the flow of Ig clonotypes and inter-tissue repertoire similarities across immune-related tissues are also evaluated. In summary, TAIR is one of the largest collections of CDR3 sequences and tissue types, and should serve as an important resource for studying immunological diseases.
Assay-based approaches provide a detailed view of the adaptive immune system by profiling immunoglobulin (Ig) receptor repertoires. However, these methods carry a high cost and lack the scale of standard RNA sequencing (RNA-Seq). Here we report the development of ImReP, a novel computational method for rapid and accurate profiling of the immunoglobulin repertoire from regular RNA-Seq data. ImReP can also accurately assemble the complementary determining regions 3 (CDR3s), the most variable regions of Ig receptors. We applied our novel method to 8,555 samples across 53 tissues from 544 individuals in the Genotype-Tissue Expression (GTEx v6) project. ImReP is able to efficiently extract Ig-derived reads from RNA-Seq data. Using ImReP, we have created a systematic atlas of 3.6 million Ig sequences across a broad range of tissue types, most of which have not been studied for Ig receptor repertoires.We also compared the GTEx tissues to track the flow of Ig clonotypes across immune-related tissues, including secondary lymphoid organs and organs encompassing mucosal, exocrine, and endocrine sites, and we examined the compositional similarities of clonal populations between these tissues. The Atlas of Immunoglobulin Repertoires (The AIR), is freely available at https://github.com/smangul1/TheAIR/wiki , is one of the largest collection of CDR3 sequences and tissue types. We anticipate this recourse will enhance future immunology studies and advance the development of therapies for human diseases. ImReP is freely available at https://github.com/mandricigor/imrep/wiki Results Related workA number of tools have been developed to reconstruct the Ig receptor repertoire.Repertoire analysis from RNA-Seq data typically starts with mapping the reads to the germline V, D, and J genes obtained from the International ImMunoGeneTics (IMGT) database 12 . There are three possible read mapping scenarios: (1) the read is entirely mapped to the V gene; (2) the read is entirely mapped to the J gene; (3) the read is partially mapped to the V and J genes simultaneously. Existing methods consider only reads from category (3). These methods use different underlying algorithms to map reads to germline genes. For example, MiXCR 8 relies on an in-house alignment procedure, IgBlast 13 utilizes BLAST with an optimized set of parameters, and IMSEQ 14 uses in-house pairwise alignment between the read sequence and the germline V and J segment sequences.Following the alignment, MiXCR performs overlapping of previously aligned reads into contigs. The resulting contigs are re-aligned to the V, D, and J genes to verify that the significant portion of non-template N insertions is covered. In contrast to MiXCR, which simultaneously aligns reads to both V, D, and J genes, IgBlast separately aligns the query read to the databases comprised of V, D, and J genes. IgBlast uses a specific sequence to separately align; first, the program finds the best V gene hit. Then, IgBlast masks the aligned read region and performs an alignment to the J gene database. (In the ...
Improvements in high-throughput sequencing makes targeted amplicon analysis an ideal method for the study of human and environmental microbiomes by undergraduates. Multiple bioinformatics programs are available to process and interpret raw microbial diversity datasets, and the choice of programs to use in curricula is largely determined by student learning goals. Many of the most commonly used microbiome bioinformatics platforms offer end-to-end data processing and data analysis using a command line interface (CLI), but the downside for novice microbiome researchers is the steep learning curve often required. Alternatively, some sequencing providers include processing of raw data and taxonomy assignments as part of their pipelines. This, when coupled with available web-based or graphical user interface (GUI) analysis and visualization tools, eliminates the need for students or instructors to have extensive CLI experience. However, lack of universal data formats can make integration of these tools challenging. For example, tools for upstream and downstream analyses frequently use multiple different data formats which then require writing custom scripts or hours of manual work to make the files compatible. Here, we describe a microbial ecology bioinformatics curriculum that focuses on data analysis, visualization, and statistical reasoning by taking advantage of existing web-based and GUI tools. We created the Program for Unifying Microbiome Analysis Applications (PUMAA), which solves the problem of inconsistent files by formatting the output files from several raw data processing programs to seamlessly transition to a suite of GUI programs for analysis and visualization of microbiome taxonomic and inferred functional profiles. Additionally, we created a series of tutorials to accompany each of the microbiome analysis curricular modules. From pre- and post-course surveys, students in this curriculum self-reported conceptual and confidence gains in bioinformatics and data analysis skills. Students also demonstrated gains in biologically relevant statistical reasoning based on rubric-guided evaluations of open-ended survey questions and the Statistical Reasoning in Biology Concept Inventory. The PUMAA program and associated analysis tutorials enable students and researchers with no computational experience to effectively analyze real microbiome datasets to investigate real-world research questions.
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.