In this paper, we present a simple and efficient GEC sequence tagger using a Transformer encoder. Our system is pre-trained on synthetic data and then fine-tuned in two stages: first on errorful corpora, and second on a combination of errorful and error-free parallel corpora. We design custom token-level transformations to map input tokens to target corrections. Our best single-model/ensemble GEC tagger achieves an F 0.5 of 65.3/66.5 on CoNLL-2014 (test) and F 0.5 of 72.4/73.6 on BEA-2019 (test). Its inference speed is up to 10 times as fast as a Transformer-based seq2seq GEC system. The code and trained models are publicly available 1 .
We present TARGER, an open source neural argument mining framework for tagging arguments in free input texts and for keyword-based retrieval of arguments from an argument-tagged web-scale corpus. The currently available models are pre-trained on three recent argument mining datasets and enable the use of neural argument mining without any reproducibility effort on the user's side. The open source code ensures portability to other domains and use cases, such as an application to search engine ranking that we also describe shortly.
In this paper, we present a simple and efficient GEC sequence tagger using a Transformer encoder. Our system is pre-trained on synthetic data and then fine-tuned in two stages: first on errorful corpora, and second on a combination of errorful and error-free parallel corpora. We design custom token-level transformations to map input tokens to target corrections. Our best single-model/ensemble GEC tagger achieves an F 0.5 of 65.3/66.5 on CoNLL-2014 (test) and F 0.5 of 72.4/73.6 on BEA-2019 (test). Its inference speed is up to 10 times as fast as a Transformer-based seq2seq GEC system. The code and trained models are publicly available 1 .
We propose a novel activation function that implements piecewise orthogonal non-linear mappings based on permutations. It is straightforward to implement, and very computationally efficient, also it has little memory requirements. We tested it on two toy problems for feedforward and recurrent networks, it shows similar performance to tanh and ReLU. OPLU activation function ensures norm preservance of the backpropagated gradients; therefore it is potentially good for the training of deep, extra deep, and recurrent neural networks.
Feedforward Neural Networks training for classification problem is considered. The Extended Kalman Filter, which has been earlier used mostly for training Recurrent Neural Networks for prediction and control, is suggested as a learning algorithm. Implementation of the cross entropy error function for mini batch training is proposed. Popular benchmarks are used to compare the method with the gradient descent, conjugate gradients and the BFGS (Broyden-Fletcher-Gold farb-Shanno) algorithm. The influence of mini batch size on time and quality of training is investi gated. The algorithms under consideration implemented as MATLAB scripts are available for free download.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.