BackgroundSecond generation sequencing has permitted detailed sequence characterisation at the whole genome level of a growing number of non-model organisms, but the data produced have short read-lengths and biased genome coverage leading to fragmented genome assemblies. The PacBio RS long-read sequencing platform offers the promise of increased read length and unbiased genome coverage and thus the potential to produce genome sequence data of a finished quality containing fewer gaps and longer contigs. However, these advantages come at a much greater cost per nucleotide and with a perceived increase in error-rate. In this investigation, we evaluated the performance of the PacBio RS sequencing platform through the sequencing and de novo assembly of the Potentilla micrantha chloroplast genome.ResultsFollowing error-correction, a total of 28,638 PacBio RS reads were recovered with a mean read length of 1,902 bp totalling 54,492,250 nucleotides and representing an average depth of coverage of 320× the chloroplast genome. The dataset covered the entire 154,959 bp of the chloroplast genome in a single contig (100% coverage) compared to seven contigs (90.59% coverage) recovered from an Illumina data, and revealed no bias in coverage of GC rich regions. Post-assembly the data were largely concordant with the Illumina data generated and allowed 187 ambiguities in the Illumina data to be resolved. The additional read length also permitted small differences in the two inverted repeat regions to be assigned unambiguously.ConclusionsThis is the first report to our knowledge of a chloroplast genome assembled de novo using PacBio sequence data. The PacBio RS data generated here were assembled into a single large contig spanning the P. micrantha chloroplast genome, with a higher degree of accuracy than an Illumina dataset generated at a much greater depth of coverage, due to longer read lengths and lower GC bias in the data. The results we present suggest PacBio data will be of immense utility for the development of genome sequence assemblies containing fewer unresolved gaps and ambiguities and a significantly smaller number of contigs than could be produced using short-read sequence data alone.
During thymic T cell differentiation, TCR repertoires are shaped by negative, positive and agonist selection. In the thymus and in the periphery, repertoires are also shaped by strong inter-clonal and intra-clonal competition to survive death by neglect. Understanding the impact of these events on the T cell repertoire requires direct evaluation of TCR expression in peripheral naïve T cells. Several studies have evaluated TCR diversity, with contradictory results. Some of these studies had intrinsic technical limitations since they used material obtained from T cell pools, preventing the direct evaluation of clonal sizes. Indeed with these approaches, identical TCRs may correspond to different cells expressing the same receptor, or to several amplicons from the same T cell. We here overcame this limitation by evaluating TCRB expression in individual naïve CD8 T cells. Of the 2269 Tcrb sequences we obtained from 13 mice, 99% were unique. Mathematical analysis of the data showed that the average number of naïve peripheral CD8 T cells expressing the same TCRB is 1.1 cell. Since TCRA co-expression studies could only increase repertoire diversity, these results reveal that the number of naïve T cells with unique TCRs approaches the number of naïve cells. Since thymocytes undergo multiple rounds of divisions after TCRB rearrangement and 3-5% of thymocytes survive thymic selection events the number of cells expressing the same TCRB was expected to be much higher. Thus, these results suggest a new repertoire selection mechanism, which strongly selects for full TCRB diversity.
This study relies on a new original dataset, the Proton Mafia Members dataset (PMM). The PMM originates from two datasets provided by the Italian Ministry of Justice: the Criminal Records Registry (Casellario dataset) and the Prison Administration Department dataset (henceforth DAP dataset). Formal agreements with the Ministry of Justice made the data available and guarantee the anonymity of all individuals in compliance with current data protection and privacy regulations. The Casellario dataset provided information on the criminal records for individuals convicted for mafia offenses between 1982 and March 2017. 1 The dataset included
Modern single-cell sequencing techniques allow the unique TCR signature of each of a sample of hundreds of T cells to be read. The mathematical challenge is to extrapolate from the properties of a sample to those of the whole repertoire of an individual, made up of many millions of T cells. We consider the distribution of the number of repeats of any TCR in a sample, the mean number of samples needed to find a repeat with probability one half, and the relationship between the true distribution of clonal sizes and that experimentally observed in the sample. We consider two special cases, where the distribution of clonal sizes is geometric, and where a subset of clones in the repertoire is expanded.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.