Multi-Head Highly Parallelized LSTM Decoder for Neural Machine Translation

Xu, Hongfei; Liu, Qiuhui; Genabith, Josef van; Xiong, Deyi; Zhang, Meng

doi:10.18653/v1/2021.acl-long.23

Cited by 7 publications

(3 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Language Models. Language models are widely used in a variety of real-world applications, such as sentiment analysis [12], [50], [65], neural translation [75], [1], and questionanswering [15], [30]. Modern language models use Transformer [70] as their backbone and contain billions of parameters, e.g., the minimal version of Stanfold Alpaca [68] (open source alternative to OpenAI ChatGPT) contains 7 billion parameters.…”

Section: A Language Models and Prompt-tuningmentioning

confidence: 99%

LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors

Wei,

Meng,

Zhang

et al. 2024

Proceedings 2024 Network and Distributed System Security Symposium

View full text Add to dashboard Cite

Prompt-tuning has emerged as an attractive paradigm for deploying large-scale language models due to its strong downstream task performance and efficient multitask serving ability. Despite its wide adoption, we empirically show that prompt-tuning is vulnerable to downstream task-agnostic backdoors, which reside in the pretrained models and can affect arbitrary downstream tasks. The state-of-the-art backdoor detection approaches cannot defend against task-agnostic backdoors since they hardly converge in reversing the backdoor triggers. To address this issue, we propose LMSanitator, a novel approach for detecting and removing task-agnostic backdoors on Transformer models. Instead of directly inverting the triggers, LMSanitator aims to invert the predefined attack vectors (pretrained models' output when the input is embedded with triggers) of the taskagnostic backdoors, which achieves much better convergence performance and backdoor detection accuracy. LMSanitator further leverages prompt-tuning's property of freezing the pretrained model to perform accurate and fast output monitoring and input purging during the inference phase. Extensive experiments on multiple language models and NLP tasks illustrate the effectiveness of LMSanitator. For instance, LMSanitator achieves 92.8% backdoor detection accuracy on 960 models and decreases the attack success rate to less than 1% in most scenarios. 1

show abstract

Section: A Language Models and Prompt-tuningmentioning

confidence: 99%

LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors

Wei,

Meng,

Zhang

et al. 2024

Proceedings 2024 Network and Distributed System Security Symposium

View full text Add to dashboard Cite

show abstract

“…For resource allocation, the presented ESMOML-RAA technique employed the HP-LSTM model. For enabling the LSTM to compute 0 t in parallel, the HPLSTM utilizes a bag-of-words representation s t of previous tokens for the computation of gates and HL [24]:…”

Section: Resource Allocation Using the Hp-lstm Modelmentioning

confidence: 99%

“…For resource allocation, the presented ESMOML-RAA technique employed the HP-LSTM model. For enabling the LSTM to compute

in parallel, the HPLSTM utilizes a bag-of-words representation

of previous tokens for the computation of gates and HL [ 24 ]:

whereas

refers to the zero vector. The BoW representation

is attained effectually using the cumulative sum function.…”

Section: The Proposed Modelmentioning

confidence: 99%

Enhanced Slime Mould Optimization with Deep-Learning-Based Resource Allocation in UAV-Enabled Wireless Networks

Alkanhel

Rafiq

Mokrov

et al. 2023

Sensors

View full text Add to dashboard Cite

Unmanned aerial vehicle (UAV) networks offer a wide range of applications in an overload situation, broadcasting and advertising, public safety, disaster management, etc. Providing robust communication services to mobile users (MUs) is a challenging task because of the dynamic characteristics of MUs. Resource allocation, including subchannels, transmit power, and serving users, is a critical transmission problem; further, it is also crucial to improve the coverage and energy efficacy of UAV-assisted transmission networks. This paper presents an Enhanced Slime Mould Optimization with Deep-Learning-based Resource Allocation Approach (ESMOML-RAA) in UAV-enabled wireless networks. The presented ESMOML-RAA technique aims to efficiently accomplish computationally and energy-effective decisions. In addition, the ESMOML-RAA technique considers a UAV as a learning agent with the formation of a resource assignment decision as an action and designs a reward function with the intention of the minimization of the weighted resource consumption. For resource allocation, the presented ESMOML-RAA technique employs a highly parallelized long short-term memory (HP-LSTM) model with an ESMO algorithm as a hyperparameter optimizer. Using the ESMO algorithm helps properly tune the hyperparameters related to the HP-LSTM model. The performance validation of the ESMOML-RAA technique is tested using a series of simulations. This comparison study reports the enhanced performance of the ESMOML-RAA technique over other ML models.

show abstract

Improving Neural Machine Translation for Low Resource Algerian Dialect by Transductive Transfer Learning Strategy

et al. 2022

View full text Add to dashboard Cite

This study is the first work on a transductive transfer learning approach for low-resource neural machine translation applied to the Algerian Arabic dialect. The transductive approach is based on a fine-tuning transfer learning strategy that transfers knowledge from the parent model to the child model. This strategy helps to solve the learning problem using limited parallel corpora. We tested the approach on a sequence-to-sequence model with and without the Attention mechanism. We first trained the models on a parallel multi-dialects Arabic corpus and then switch them to a low-resource of the Algerian dialect. Transductive transfer learning raises the BLEU score for the Seq2Seq model from 0.3 to more than 34, and for the Attentional-Seq2Seq model from less than 17 to more than 35. The obtained results prove the validity of this approach.

show abstract

Multi-Head Highly Parallelized LSTM Decoder for Neural Machine Translation

Cited by 7 publications

References 18 publications

LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors

LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors

Enhanced Slime Mould Optimization with Deep-Learning-Based Resource Allocation in UAV-Enabled Wireless Networks

Improving Neural Machine Translation for Low Resource Algerian Dialect by Transductive Transfer Learning Strategy

Contact Info

Product

Resources

About