High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in well-annotated mammalian species. The advances in sequencing technology have created a need for studies and tools that can characterize these novel variants. Here, we present SQANTI, an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline using 47 unique descriptors. We apply SQANTI to a neuronal mouse transcriptome using Pacific Biosciences (PacBio) long reads and illustrate how the tool is effective in characterizing and describing the composition of the full-length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an important number of the novel transcripts are technical artifacts of the sequencing approach and that SQANTI quality descriptors can be used to engineer a filtering strategy to remove them. Most novel transcripts in this curated transcriptome are novel combinations of existing splice sites, resulting more frequently in novel ORFs than novel UTRs, and are enriched in both general metabolic and neural-specific functions. We show that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms. By comparing our iso-transcriptome with public proteomics databases, we find that alternative isoforms are elusive to proteogenomics detection. SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes.
SummaryDELLA proteins are plant nuclear factors that restrain growth and proliferation in response to hormonal signals. The effects of the manipulation of the DELLA pathway in the making of a berry-like fruit were investigated. The expression of the Arabidopsis thaliana gain-of-function DELLA allele Atgai del in tomato (Solanum lycopersicum L.) produced partially sterile dwarf plants and compacted influorescences, as expected for a constitutively activated growth repressor. In contrast, antisense silencing of the single endogenous tomato DELLA gene homologue (SlDELLA) produced slender-like plants with elongated flower trusses. Interestingly, the depletion of SlDELLA in tomato was sufficient to overcome the growth arrest normally imposed on the ovary at anthesis, resulting in parthenocarpic fruits in the absence of pollination. Antisense SlDELLA-engineered fruits were smaller in size and elongated in shape compared with wild type. Cell number estimations showed that fruit set, resulting from reduced SlDELLA expression, arose from activated cell elongation at the longitudinal and lateral axes of the fruit pericarp, bypassing phase-II (post-pollination) cell divisions. Parthenocarpy caused by SlDELLA depletion is facultative, as hand pollination restored wild-type fruit phenotype. This indicates that fertilization-associated SlDELLA-independent signals are operational in ovary-fruit transitions. SlDELLA was also found to restrain growth in other reproductive structures, affecting style elongation, stylar hair primordial growth and stigma development.
(292 words) 22High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of 23 thousands of novel transcripts, even in very well annotated organisms as mice and humans. Nonetheless, there is a 24 need for studies and tools that characterize these novel isoforms. Here we present SQANTI, an automated pipeline 25 for the classification of long-read transcripts that computes 47 descriptors that can be used to assess the quality of 26 the data and of the preprocessing pipelines. We applied SQANTI to a neuronal mouse transcriptome using PacBio 27 long reads and illustrate how the tool is effective in readily describing the composition of and characterizing the full-28 length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an 29 important number of the novel transcripts are technical artifacts of the sequencing approach, and that SQANTI 30 quality descriptors can be used to engineer a filtering strategy to remove them. Most novel transcripts in this curated 31 transcriptome are novel combinations of existing splice sites, result more frequently in novel ORFs than novel UTRs 32 and are enriched in both general metabolic and neural specific functions. We show that these new transcripts have a 33 major impact in the correct quantification of transcript levels by state-of-the-art short-read based quantification 34 algorithms. By comparing our iso-transcriptome with public proteomics databases we find that alternative isoforms
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.