Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation

Gheini, Mozhdeh; Ren, Xiang; May, Jonathan

doi:10.18653/v1/2021.emnlp-main.132

Cited by 40 publications

(14 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Subsequently, it was widely applied to a variety of tasks, e.g. image-text classification ( Lee et al , 2018 ) and machine translation ( Gheini et al , 2021 ). These applications have demonstrated the cross-attention mechanism enabling to construct explicit interaction between two separate inputs to fully take advantage of their correlation.…”

Section: Methodsmentioning

confidence: 99%

CAPLA: improved prediction of protein–ligand binding affinity by a deep learning approach based on a cross-attention mechanism

Jin

Chen

et al. 2023

Bioinformatics

View full text Add to dashboard Cite

Motivation Accurate and rapid prediction of protein-ligand binding affinity is a great challenge currently encountered in drug discovery. Recent advances have manifested a promising alternative in applying deep learning-based computational approaches for accurately quantifying binding affinity. The structure complementarity between protein-binding pocket and ligand has a great effect on the binding strength between a protein and a ligand, but most of existing deep learning approaches usually extracted the features of pocket and ligand by these two detached modules. Results In this work, a new deep learning approach based on the cross-attention mechanism named CAPLA was developed for improved prediction of protein-ligand binding affinity by learning features from sequence-level information of both protein and ligand. Specifically, CAPLA employs the cross-attention mechanism to capture the mutual effect of protein-binding pocket and ligand. We evaluated the performance of our proposed CAPLA on comprehensive benchmarking experiments on binding affinity prediction, demonstrating the superior performance of CAPLA over state-of-the-art baseline approaches. Moreover, we provided the interpretability for CAPLA to uncover critical functional residues that contribute most to the binding affinity through the analysis of the attention scores generated by the cross-attention mechanism. Consequently, these results indicate that CAPLA is an effective approach for binding affinity prediction and may contribute to useful help for further consequent applications. Availability The source code of the method along with trained models are freely available at https://github.com/lennylv/CAPLA. Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Section: Methodsmentioning

confidence: 99%

CAPLA: improved prediction of protein–ligand binding affinity by a deep learning approach based on a cross-attention mechanism

Jin

Chen

et al. 2023

Bioinformatics

View full text Add to dashboard Cite

show abstract

“…All of these existing approaches do not use gaze-signal as input and report loss in accuracy if they do. Some recent approaches also leverage attention-transformers for multi-modal learning, for example: [Gheini et al 2021] uses cross-attention to avoid fine-tuning for language translation models; [Mohla et al 2020] uses attention from Lidar and content from spectral imaging to combine them for image-segmentation; and [Ye et al 2019] uses attention-transformers to segment out the object described in the form of text from a given image. CMA, on the other hand, infers the spatio-temporal relationships across different modalities by combining information from all the modalities via attention-transformers [Vaswani et al 2017] and adaptively updates features for each modality to disseminate the global information from all the modalities.…”

Section: Multi-modal Fusionmentioning

confidence: 99%

Can Gaze Inform Egocentric Action Recognition?

Zhang

Crandall

Proulx

et al. 2022

2022 Symposium on Eye Tracking Research and Applications

View full text Add to dashboard Cite

We investigate the hypothesis that gaze-signal can improve egocentric action recognition on the standard benchmark, EGTEA Gaze++ dataset. In contrast to prior work where gaze-signal was only used during training, we formulate a novel neural fusion approach, Cross-modality Attention Blocks (CMA), to leverage gaze-signal for action recognition during inference as well. CMA combines information from different modalities at different levels of abstraction to achieve state-of-the-art performance for egocentric action recognition. Specifically, fusing the video-stream with optical-flow with CMA outperforms the current state-of-the-art by 3%. However, when CMA is employed to fuse gaze-signal with video-stream data, no improvements are observed. Further investigation of this counter-intuitive finding indicates that small spatial overlap between the network's attention-map and gaze groundtruth renders the gaze-signal uninformative for this benchmark. Based on our empirical findings, we recommend improvements to the current benchmark to develop practical systems for egocentric video understanding with gaze-signal. CCS CONCEPTS• Computing methodologies → Activity recognition and understanding; Neural networks.

show abstract

“…We decided to follow this protocol, in order to isolate the effects on the final BLEU score on the ablated component, and to also prevent the other components from compensating. In concurrent work, Gheini et al (2021) have considered a similar experimental protocol, but to study a different but related phenomenon. In Figure 8, we show the ablation results for the en→de direction.…”

Section: B1 Supervised Translation Ablationsmentioning

confidence: 99%

Exploring Unsupervised Pretraining Objectives for Machine Translation

Baziotis¹,

Titov²,

Birch³

et al. 2021

Preprint

View full text Add to dashboard Cite

Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT), by drastically reducing the need for large parallel data. Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. In this work, we systematically compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context. We pretrain models with different methods on English↔German, English↔Nepali and English↔Sinhala monolingual data, and evaluate them on NMT. In (semi-) supervised NMT, varying the pretraining objective leads to surprisingly small differences in the finetuned performance, whereas unsupervised NMT is much more sensitive to it. To understand these results, we thoroughly study the pretrained models using a series of probes and verify that they encode and use information in different ways. We conclude that finetuning on parallel data is mostly sensitive to few properties that are shared by most models, such as a strong decoder, in contrast to unsupervised NMT that also requires models with strong cross-lingual abilities.

show abstract

Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation

Cited by 40 publications

References 27 publications

CAPLA: improved prediction of protein–ligand binding affinity by a deep learning approach based on a cross-attention mechanism

CAPLA: improved prediction of protein–ligand binding affinity by a deep learning approach based on a cross-attention mechanism

Can Gaze Inform Egocentric Action Recognition?

Exploring Unsupervised Pretraining Objectives for Machine Translation

Contact Info

Product

Resources

About