Dependency-to-Dependency Neural Machine Translation

Wu, Shuangzhi; Zhang, Dongdong; Zhang, Zhirui; Yang, Nan; Li, Mu; Zhou, Ming

doi:10.1109/taslp.2018.2855968

Cited by 62 publications

(51 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As for tree-based NMT models, a lot of different methods have been proposed. Trees can be used in either the source side (Li, Xiong, Tu, Zhu, Zhang, and Zhou 2017) or the target side (Aharoni and Goldberg 2017), or both (Wu et al 2018), can be encoded either using treestructured neural networks (Eriguchi et al 2016) or with the help of linearization (Sennrich and Haddow 2016), and can be either constituent trees (Chen, Huang, Chiang, and Chen 2017) or dependency trees (Wu, Zhang, Yang, Li, and Zhou 2017). As for forest-based NMT models, Ma et al 2018is the first attempt, where linearized packed forests are encoded using RNNs in order to make the model robust to parsing errors.…”

Section: Syntax-based Nmtmentioning

confidence: 99%

“…Syntactic information can be used on either the source-side (Eriguchi, Tsuruoka, and Cho 2017), or the target-side (Aharoni and Goldberg 2017), or both (Wu, Zhang, Zhang, Yang, Li, and Zhou 2018). Syntactic information can be represented as constituent trees (Eriguchi, Hashimoto, and Tsuruoka 2016), packed forests (Ma, Tamura, Utiyama, Zhao, and Sumita 2018), or graphs (Hashimoto and Tsuruoka 2017).…”

mentioning

confidence: 99%

See 1 more Smart Citation

Syntax-based Transformer for Neural Machine Translation

Tamura

Utiyama

et al. 2020

Journal of Natural Language Processing

View full text Add to dashboard Cite

Polosukhin 2017), which purely depends on attention mechanism, has achieved stateof-the-art performance on machine translation (MT). However, syntactic information, which has improved many previous MT models, has not been utilized explicitly by Transformer. We propose a syntax-based Transformer for MT, which incorporates source-side syntax structures generated by the parser into the self-attention and positional encoding of the encoder. Our method is general in that it is applicable to both constituent trees and packed forests. Evaluations on two language pairs show that our syntax-based Transformer outperforms the conventional (non-syntactic) Transformer. The improvements of BLEUs on English-Japanese, English-Chinese and English-German translation tasks are up to 2.32, 2.91 and 1.03, respectively. Furthermore, our ablation study and qualitative analysis demonstrate that the syntax-based self-attention does well in learning local structural information, while the syntax-based positional encoding does well in learning global structural information.

show abstract

Section: Syntax-based Nmtmentioning

confidence: 99%

mentioning

confidence: 99%

Syntax-based Transformer for Neural Machine Translation

Tamura

Utiyama

et al. 2020

Journal of Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Incorporating morphological information for NMT is a challenging area of research. A significant number of works involve dependency structure at the source side (Eriguchi et al, 2016;Shi et al, 2016;Bastings et al, 2017;Chen et al, 2017;Hashimoto and Tsuruoka, 2017;Li et al, 2017;Wu et al, 2018;Zhang et al, 2019). Eriguchi et al (2016) proposed a syntax-aware encoding mechanism that encodes the source sentence maintaining the hierarchy of its dependency tree.…”

Section: Related Workmentioning

confidence: 99%

Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation

Chakrabarty¹,

Dabre²,

Ding³

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

In this study, linguistic knowledge at different levels are incorporated into the neural machine translation (NMT) framework to improve translation quality for language pairs with extremely limited data. Integrating manually designed or automatically extracted features into the NMT framework is known to be beneficial. However, this study emphasizes that the relevance of the features is crucial to the performance. Specifically, we propose two methods, 1) self relevance and 2) word-based relevance, to improve the representation of features for NMT. Experiments are conducted on translation tasks from English to eight Asian languages, with no more than twenty thousand sentences for training. The proposed methods improve translation quality for all tasks by up to 3.09 BLEU points. Discussions with visualization provide the explainability of the proposed methods where we show that the relevance methods provide weights to features thereby enhancing their impact on low-resource machine translation.

show abstract

“…Long short-term memory (LSTM), a variant of RNN, has the ability of mining long-distance time-series data information 15 . It is extensively used in machine translation 16,17 , fault diagnosis 18,19 , speech recognition 20,21 , and electrocardiogram classification 22,23 . In literature 24 , the representation of speech signals from an original network is automatically learned by CNN, and then the temporal representation of features is learned by LSTM; In literature 25 , the features of wearable sensor data is learned by CNN, and then the time dependence between actions are modeled by LSTM.…”

mentioning

confidence: 99%

Few-shot pulse wave contour classification based on multi-scale feature extraction

Liu

Mao

et al. 2021

Sci Rep

View full text Add to dashboard Cite

The annotation procedure of pulse wave contour (PWC) is expensive and time-consuming, thereby hindering the formation of large-scale datasets to match the requirements of deep learning. To obtain better results under the condition of few-shot PWC, a small-parameter unit structure and a multi-scale feature-extraction model are proposed. In the small-parameter unit structure, information of adjacent cells is transmitted through state variables. Simultaneously, a forgetting gate is used to update the information and retain long-term dependence of PWC in the form of unit series. The multi-scale feature-extraction model is an integrated model containing three parts. Convolution neural networks are used to extract spatial features of single-period PWC and rhythm features of multi-period PWC. Recursive neural networks are used to retain the long-term dependence features of PWC. Finally, an inference layer is used for classification through extracted features. Classification experiments of cardiovascular diseases are performed on photoplethysmography dataset and continuous non-invasive blood pressure dataset. Results show that the classification accuracy of the multi-scale feature-extraction model on the two datasets respectively can reach 80% and 96%, respectively.

show abstract

Dependency-to-Dependency Neural Machine Translation

Cited by 62 publications

References 22 publications

Syntax-based Transformer for Neural Machine Translation

Syntax-based Transformer for Neural Machine Translation

Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation

Few-shot pulse wave contour classification based on multi-scale feature extraction

Contact Info

Product

Resources

About