Incorporating Copying Mechanism in Sequence-to-Sequence Learning

Gu, Jiatao; Lu, Zhengdong; Li, Hang; Li, Victor O. K.

doi:10.18653/v1/p16-1154

Cited by 1,318 publications

(1,121 citation statements)

References 20 publications

Supporting

Mentioning

1,094

Contrasting

Unclassified

Order By: Relevance

“…Another idea relevant to explore in future work is to consider the networks that are designed to be strong at character copying which is the most common operation in string transduction tasks such as morphological segmentation, morphological reinflection and normalization (Gu et al, 2016;See et al, 2017;Makarov et al, 2017).…”

Section: Discussionmentioning

confidence: 99%

Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Lévy

Specia

2017

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 99%

Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Lévy

Specia

2017

View full text Add to dashboard Cite

“…Because the word of summary often appears in source text, and the copy network [15,16] is used to solve this problem and copy network also can produce the unknown word that is not in vocabularies. According to this, we also improved probability prediction of our model when predicting the summary word at every time step.…”

Section: Advances In Intelligent Systems Research Volume 147mentioning

confidence: 99%

“…According to this, we also improved probability prediction of our model when predicting the summary word at every time step. In equation (15), the decoder uses the SoftMax activation to normalize the probability of each predicted word at time t through the fully connected layer whose inputs are t s , 1 t y  , and P t whose calculation is similar to the 1 t P  and sums it with the other probability a P which is considered to produces the better summary word of source text instead of the wild symbols. a P is defined as equation (16) in which when the output t y is in the original text and belong to the P set which only consists of wild symbols we defined in data prepressing stage, a P is probability of…”

Section: Advances In Intelligent Systems Research Volume 147mentioning

confidence: 99%

Chinese Short Text Summary Generation Model Combining Global and Local Information

Chen¹

2018

Proceedings of the 2018 International Conference on Network, Communication, Computer Engineering (NCCE 2018)

View full text Add to dashboard Cite

Abstract. Short text comprehension summary generation is currently a hot issue. In this paper, we improve the attention mechanism under the framework of encoder-decoder and proposes a comprehensible short text abstract generation model that integrates the global and local semantic information. The model consists of a dual encoder and a decoder. The dual encoder structure can combine the global and local semantic information and fully obtain the abstract features of the original text. And the improved mechanism can adaptively combine all information of short text to provide the input with summary characteristics for the decoder, so that the decoder can more accurately focus on the core content of the source text. In this paper, LCSTS dataset is used to train and test the model. The experimental results show that compared with the Seq2Seq and Seq2Seq with standard attention models, the proposed method can produce high-quality summary which consists of less repetitive words and performs better evaluation value in ROUGE.

show abstract

“…To the best of our knowledge, we are the first to study the adaptation of neural summarization models for new domain. Furthermore, Recent work in neural summarization mainly focuses on specfic extensions to improve system performance Takase et al, 2016;Gu et al, 2016;Ranzato et al, 2015). It is unclear how to adapt the existing neural summarization systems to a new domain when the training data is limited or not available.…”

Section: Related Workmentioning

confidence: 99%

“…The sequence-to-sequence architecture of Sutskever et al (2014), also known as the encoder-decoder architecture, is now the gold standard for many NLP tasks, including machine translation (Sutskever et al, 2014;Bahdanau et al, 2015), question answering , dialogue (Li et al, 2016), caption generation (Xu et al, 2015), and in particular summarization .…”

Section: Introductionmentioning

confidence: 99%

Proceedings of the Workshop on New Frontiers in Summarization

2017

View full text Add to dashboard Cite

With the prevalence of video sharing, there are increasing demands for automatic video digestion such as highlight detection. Recently, platforms with crowdsourced time-sync video comments have emerged worldwide, providing a good opportunity for highlight detection. However, this task is non-trivial: (1) time-sync comments often lag behind their corresponding shot; (2) time-sync comments are semantically sparse and noisy; (3) to determine which shots are highlights is highly subjective. The present paper aims to tackle these challenges by proposing a framework that (1) uses concept-mapped lexical-chains for lagcalibration; (2) models video highlights based on comment intensity and combination of emotion and concept concentration of each shot; (3) summarize each detected highlight using improved SumBasic with emotion and concept mapping. Experiments on large real-world datasets show that our highlight detection method and summarization method both outperform other benchmarks with considerable margins. IntroductionEvery day, people watch billions of hours of videos on YouTube, with half of the views on mobile devices 1 . With the prevalence of video shar-1 https://www.youtube.com/yt/press/statistics.html ing, there is increasing demand for fast video digestion. Imagine a scenario where a user wants to quickly grasp a long video, without dragging the progress bar repeatedly to skip shots unappealing to the user. With automatically-generated highlights, users could digest the entire video in minutes, before deciding whether to watch the full video later. Moreover, automatic video highlight detection and summarization could benefit video indexing, video search and video recommendation. However, finding highlights from a video is not a trivial task. First, what is considered to be a "highlight" can be very subjective. Second, a highlight may not always be captured by analyzing low-level features in image, audio and motions. Lack of abstract semantic information has become a bottleneck of highlight detection in traditional video processing.Recently, crowdsourced time-sync video comments, or "bullet-screen comments" have emerged, where real-time generated comments will be flying over or besides the screen, synchronized with the video frame by frame. It has gained popularity worldwide, such as niconico in Japan, Bilibili and Acfun in China, YouTube Live and Twitch Live in USA. The popularity of the timesync comments has suggested new opportunities for video highlight detection based on natural language processing.Nevertheless, it is still a challenge to detect and label highlights using time-sync comments. First, there is almost inevitable lag for comments related to each shot. As in Figure 1, ongoing discussion about one shot may extend to next a few shots. Highlight detection and labeling without lagcalibration may cause inaccurate results. Second, 1 time-sync comments are sparse semantically, both in number of comments per shot and number of tokens per comment. Traditionally bag-of-words statistical model may...

show abstract

Incorporating Copying Mechanism in Sequence-to-Sequence Learning

Cited by 1,318 publications

References 20 publications

Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Chinese Short Text Summary Generation Model Combining Global and Local Information

Proceedings of the Workshop on New Frontiers in Summarization

Contact Info

Product

Resources

About