BackgroundHigh-throughput transcriptome sequencing (RNA-seq) technology promises to discover novel protein-coding and non-coding transcripts, particularly the identification of long non-coding RNAs (lncRNAs) from de novo sequencing data. This requires tools that are not restricted by prior gene annotations, genomic sequences and high-quality sequencing.ResultsWe present an alignment-free tool called PLEK (predictor of long non-coding RNAs and messenger RNAs based on an improved k-mer scheme), which uses a computational pipeline based on an improved k-mer scheme and a support vector machine (SVM) algorithm to distinguish lncRNAs from messenger RNAs (mRNAs), in the absence of genomic sequences or annotations. The performance of PLEK was evaluated on well-annotated mRNA and lncRNA transcripts. 10-fold cross-validation tests on human RefSeq mRNAs and GENCODE lncRNAs indicated that our tool could achieve accuracy of up to 95.6%. We demonstrated the utility of PLEK on transcripts from other vertebrates using the model built from human datasets. PLEK attained >90% accuracy on most of these datasets. PLEK also performed well using a simulated dataset and two real de novo assembled transcriptome datasets (sequenced by PacBio and 454 platforms) with relatively high indel sequencing errors. In addition, PLEK is approximately eightfold faster than a newly developed alignment-free tool, named Coding-Non-Coding Index (CNCI), and 244 times faster than the most popular alignment-based tool, Coding Potential Calculator (CPC), in a single-threading running manner.ConclusionsPLEK is an efficient alignment-free computational tool to distinguish lncRNAs from mRNAs in RNA-seq transcriptomes of species lacking reference genomes. PLEK is especially suitable for PacBio or 454 sequencing data and large-scale transcriptome data. Its open-source software can be freely downloaded from https://sourceforge.net/projects/plek/files/.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2105-15-311) contains supplementary material, which is available to authorized users.
Human respiratory syncytial virus (HRSV) is the main cause of acute lower respiratory infections in children under 2 years of age and causes repeated infections throughout life. We investigated the genetic variability of RSV-A circulating in Ontario during 2010–2011 winter season by sequencing and phylogenetic analysis of the G glycoprotein gene. Among the 201 consecutive RSV isolates studied, RSV-A (55.7%) was more commonly observed than RSV-B (42.3%). 59.8% and 90.1% of RSV-A infections were among children ≤12 months and ≤5 years old, respectively. On phylogenetic analysis of the second hypervariable region of the 112 RSV-A strains, 110 (98.2%) clustered within or adjacent to the NA1 genotype; two isolates were GA5 genotype. Eleven (10%) NA1-related isolates clustered together phylogenetically as a novel RSV-A genotype, named ON1, containing a 72 nucleotide duplication in the C-terminal region of the attachment (G) glycoprotein. The predicted polypeptide is lengthened by 24 amino acids and includes a23 amino acid duplication. Using RNA secondary structural software, a possible mechanism of duplication occurrence was derived. The 23 amino acid ON1 G gene duplication results in a repeat of 7 potential O-glycosylation sites including three O-linked sugar acceptors at residues 270, 275, and 283. Using Phylogenetic Analysis by Maximum Likelihood analysis, a total of 19 positively selected sites were observed among Ontario NA1 isolates; six were found to be codons which reverted to the previous state observed in the prototype RSV-A2 strain. The tendency of codon regression in the G-ectodomain may infer a decreased avidity of antibody to the current circulating strains. Further work is needed to document and further understand the emergence, virulence, pathogenicity and transmissibility of this novel RSV-A genotype with a72 nucleotide G gene duplication.
To identify sequence domains important for the neurotoxic and neuroprotective activities of the prion protein (PrP), we have engineered transgenic mice that express a form of murine PrP deleted for a conserved block of 21 amino acids (residues 105-125) in the unstructured, N-terminal tail of the protein. These mice spontaneously developed a severe neurodegenerative illness that was lethal within 1 week of birth in the absence of endogenous PrP. This phenotype was reversed in a dose-dependent fashion by coexpression of wild-type PrP, with five-fold overexpression delaying death beyond 1 year. The phenotype of Tg(PrPD105-125) mice is reminiscent of, but much more severe than, those described in mice that express PrP harboring larger deletions of the N-terminus, and in mice that ectopically express Doppel, a PrP paralog, in the CNS. The dramatically increased toxicity of PrPD105-125 is most consistent with a model in which this protein has greatly enhanced affinity for a hypothetical receptor that serves to transduce the toxic signal. We speculate that altered binding interactions involving the 105-125 region of PrP may also play a role in generating neurotoxic signals during prion infection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.