Alternative splicing (AS) plays a crucial role in the diversification of gene function and regulation. Consequently, the systematic identification and characterization of temporally regulated splice variants is of critical importance to understanding animal development. We have used high-throughput RNA sequencing and microarray profiling to analyze AS in C. elegans across various stages of development. This analysis identified thousands of novel splicing events, including hundreds of developmentally regulated AS events. To make these data easily accessible and informative, we constructed the C. elegans Splice Browser, a web resource in which researchers can mine AS events of interest and retrieve information about their relative levels and regulation across development. The data presented in this study, along with the Splice Browser, provide the most comprehensive set of annotated splice variants in C. elegans to date, and are therefore expected to facilitate focused, high resolution in vivo functional assays of AS function.[Supplemental material is available for this article. The sequence data from this study have been submitted to the NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi) under accession no. SRA009279. The microarray data from this study have been submitted to the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/ geo) under accession no. GSE25927.] Alternative splicing (AS) is the process by which multiple mRNA transcripts are produced from a single precursor transcript through the differential utilization of splice sites. Alternative splicing is one of the key mechanisms that have evolved in metazoans to generate increased transcriptome complexity and recent studies estimate that greater than 95% of human multi-exon genes express multiple splice isoforms (Pan et al. 2008;. Moreover, alternatively spliced exons are often differentially regulated across tissues and during development, suggesting that individual isoforms may serve specific spatial or temporal roles (Hartmann and Valcarcel 2009;Licatalosi and Darnell 2010;Nilsen and Graveley 2010).The importance of proper regulation of AS during development has been demonstrated in many different instances; one particularly well-studied example is that of the sex determination pathway in Drosophila. In this pathway, the female-specific expression of a splicing regulator transformer stimulates the inclusion of exons in transcripts of the doublesex and fruitless transcription factor genes (Lopez 1998;Forch and Valcarcel 2003). The femalespecific isoforms of these transcription factors subsequently activate the expression of genes required for female development, while the male-specific variants induce a gene expression program important for male differentiation (Dulac 2005;Shirangi and McKeown 2007). Similar spatio-temporally regulated AS networks are likely to exist in metazoans. The characterization of these AS networks, and their integration with other layers of gene regulation, will be necessary for a more compl...
End-to-end (E2E) automatic speech recognition (ASR) systems lack the distinct language model (LM) component that characterizes traditional speech systems. While this simplifies the model architecture, it complicates the task of incorporating textonly data into training, which is important to the recognition of tail words that do not occur often in audio-text pairs. While shallow fusion has been proposed as a method for incorporating a pre-trained LM into an E2E model at inference time, it has not yet been explored for very large text corpora, and it has been shown to be very sensitive to hyperparameter settings in the beam search. In this work, we apply shallow fusion to incorporate a very large text corpus into a state-of-the-art E2E ASR model. We explore the impact of model size and show that intelligent pruning of the training set can be more effective than increasing the parameter count. Additionally, we show that incorporating the LM in minimum word error rate (MWER) fine tuning makes shallow fusion far less dependent on optimal hyperparameter settings, reducing the difficulty of that tuning problem.
Language identification is critical for many downstream tasks in automatic speech recognition (ASR), and is beneficial to integrate into multilingual end-to-end ASR as an additional task. In this paper, we propose to modify the structure of the cascadedencoder-based recurrent neural network transducer (RNN-T) model by integrating a per-frame language identifier (LID) predictor. RNN-T with cascaded encoders can achieve streaming ASR with low latency using first-pass decoding with no right-context, and achieve lower word error rates (WERs) using second-pass decoding with longer right-context. By leveraging such differences in the right-contexts and a streaming implementation of statistics pooling, the proposed method can achieve accurate streaming LID prediction with little extra testtime cost. Experimental results on a voice search dataset with 9 language locales shows that the proposed method achieves an average of 96.2% LID prediction accuracy and the same secondpass WER as that obtained by including oracle LID in the input.
We propose a new two-pass E2E speech recognition model that improves ASR performance by training on a combination of paired data and unpaired text data. Previously, the joint acoustic and text decoder (JATD) has shown promising results through the use of text data during model training and the recently introduced deliberation architecture has reduced recognition errors by leveraging first-pass decoding results. Our method, dubbed Deliberation-JATD, combines the spelling correcting abilities of deliberation with JATD's use of unpaired text data to further improve performance. The proposed model produces substantial gains across multiple test sets, especially those focused on rare words, where it reduces word error rate (WER) by between 12% and 22.5% relative. This is done without increasing model size or requiring multi-stage training, making Deliberation-JATD an efficient candidate for on-device applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.