Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2016
DOI: 10.18653/v1/p16-1154
|View full text |Cite
|
Sign up to set email alerts
|

Incorporating Copying Mechanism in Sequence-to-Sequence Learning

Abstract: We address an important problem in sequence-to-sequence (Seq2Seq) learning referred to as copying, in which certain segments in the input sequence are selectively replicated in the output sequence. A similar phenomenon is observable in human language communication. For example, humans tend to repeat entity names or even long phrases in conversation. The challenge with regard to copying in Seq2Seq is that new machinery is needed to decide when to perform the operation. In this paper, we incorporate copying into… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
1,094
1
1

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 1,318 publications
(1,121 citation statements)
references
References 20 publications
2
1,094
1
1
Order By: Relevance
“…Another idea relevant to explore in future work is to consider the networks that are designed to be strong at character copying which is the most common operation in string transduction tasks such as morphological segmentation, morphological reinflection and normalization (Gu et al, 2016;See et al, 2017;Makarov et al, 2017).…”
Section: Discussionmentioning
confidence: 99%
“…Another idea relevant to explore in future work is to consider the networks that are designed to be strong at character copying which is the most common operation in string transduction tasks such as morphological segmentation, morphological reinflection and normalization (Gu et al, 2016;See et al, 2017;Makarov et al, 2017).…”
Section: Discussionmentioning
confidence: 99%
“…Because the word of summary often appears in source text, and the copy network [15,16] is used to solve this problem and copy network also can produce the unknown word that is not in vocabularies. According to this, we also improved probability prediction of our model when predicting the summary word at every time step.…”
Section: Advances In Intelligent Systems Research Volume 147mentioning
confidence: 99%
“…According to this, we also improved probability prediction of our model when predicting the summary word at every time step. In equation (15), the decoder uses the SoftMax activation to normalize the probability of each predicted word at time t through the fully connected layer whose inputs are t s , 1 t y  , and P t whose calculation is similar to the 1 t P  and sums it with the other probability a P which is considered to produces the better summary word of source text instead of the wild symbols. a P is defined as equation (16) in which when the output t y is in the original text and belong to the P set which only consists of wild symbols we defined in data prepressing stage, a P is probability of…”
Section: Advances In Intelligent Systems Research Volume 147mentioning
confidence: 99%
“…To the best of our knowledge, we are the first to study the adaptation of neural summarization models for new domain. Furthermore, Recent work in neural summarization mainly focuses on specfic extensions to improve system performance Takase et al, 2016;Gu et al, 2016;Ranzato et al, 2015). It is unclear how to adapt the existing neural summarization systems to a new domain when the training data is limited or not available.…”
Section: Related Workmentioning
confidence: 99%
“…The sequence-to-sequence architecture of Sutskever et al (2014), also known as the encoder-decoder architecture, is now the gold standard for many NLP tasks, including machine translation (Sutskever et al, 2014;Bahdanau et al, 2015), question answering , dialogue (Li et al, 2016), caption generation (Xu et al, 2015), and in particular summarization .…”
Section: Introductionmentioning
confidence: 99%