Connectionist temporal classification (CTC) based supervised sequence training of recurrent neural networks (RNNs) has shown great success in many machine learning areas including end-to-end speech and handwritten character recognition. For the CTC training, however, it is required to unroll (or unfold) the RNN by the length of an input sequence. This unrolling requires a lot of memory and hinders a small footprint implementation of online learning or adaptation. Furthermore, the length of training sequences is usually not uniform, which makes parallel training with multiple sequences inefficient on shared memory models such as graphics processing units (GPUs). In this work, we introduce an expectation-maximization (EM) based online CTC algorithm that enables unidirectional RNNs to learn sequences that are longer than the amount of unrolling. The RNNs can also be trained to process an infinitely long input sequence without pre-segmentation or external reset. Moreover, the proposed approach allows efficient parallel training on GPUs. For evaluation, phoneme recognition and end-to-end speech recognition examples are presented on the TIMIT and Wall Street Journal (WSJ) corpora, respectively. Our online model achieves 20.7% phoneme error rate (PER) on the very long input sequence that is generated by concatenating all 192 utterances in the TIMIT core test set. On WSJ, a network can be trained with only 64 times of unrolling while sacrificing 4.5% relative word error rate (WER).
The descriptions of complex events usually span sentences, so we need to extract complete event information from the whole document. To address the challenges of document-level event extraction, we propose a novel framework named Document-level Event Extraction as Relation Extraction (DEERE), which is suitable for document-level event extraction tasks without trigger-word labelling. By well-designed task transformation, DEERE remodels event extraction as single-stage relation extraction, which can mitigate error propagation. A long text supported encoder is adopted in the relation extraction model to aware the global context effectively. A fault-tolerant event integration algorithm is designed to improve the prediction accuracy. Experimental results show that our approach advances the SOTA for the ChFinAnn dataset by an average F1-score of 3.7. The code and data are available at https://github.com/maomaotfntfn/DEERE.
In order to support the intelligent combat capability of future wars, the network information system needs in-depth research and exploration in the field of artificial intelligence. This paper designs a distributed aggregation storage mode with the characteristics of balanced storage load distribution and local node aggregation storage under Big Table model. A distributed parallel query engine that uses the Group-By mode to distribute parallel computation query trees. A knowledge map completion method based on Bayesian inference is proposed. Bayesian probabilistic inference theory and RDF implication inference rules are used to jointly infer the potential relationships between entity nodes to predict the relationship between new nodes and original nodes, thus improving the mining efficiency of potential factors in the model and the accuracy of prediction of unknown relationships. Based on Big Table model, the entity sets stored row by row are evenly divided and stored through random prefix and pre-partition operations to realize load balancing. At the same time, random prefixes can be uniformly distributed to nodes for storing entities of the same type, and aggregated by entity category on a single node. With the continuous development of knowledge map technology, future knowledge map learning and reasoning technology can be integrated with new fields such as machine depth learning, cloud computing, block chain, big data, biological genetic engineering, etc. to play an important social value.
Transliteration annotation in Russian speech recognition corpus has been always a problem which is difficult to resolve due to its time consuming, low efficiency and difficulty in personnel and quality control. This paper presents a technology based on crowdsourcing, designs the crowdsourcing platform within LAN and calls Russian majors to finish the large-scale corpus transliteration annotation task in a short time. After analysis and comparison, the results are comparable to those of the traditional methods. It provides a reference for improving the efficiency of speech corpus transliteration annotation and puts forward suggestions on improvements of some problems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.