Dialogue systems in open domain have achieved great success due to the easily obtained single-turn corpus and the development of deep learning, but the multi-turn scenario is still a challenge because of the frequent coreference and information omission. In this paper, we investigate the incomplete utterance restoration which has brought general improvement over multi-turn dialogue systems in recent studies. Meanwhile, inspired by the autoregression for text generation and the sequence labeling for text editing, we propose a novel semi autoregressive generator (SARG) with the high efficiency and flexibility. Moreover, experiments on Restoration-200k show that our proposed model significantly outperforms the state-of-the-art models in terms of quality and inference speed.
Most summarization task focuses on generating relatively short summaries. Such a length constraint might not be appropriate when summarizing scientific work. The LongSumm task needs participants generate long summary for scientific document. This task usual can be solved by language model. But an important problem is that model like BERT is limit to memory, and can not deal with a long input like a document. Also generate a long output is hard. In this paper, we propose a session based automatic summarization model (SBAS) which using a session and ensemble mechanism to generate long summary. And our model achieves the best performance in the LongSumm task.
Dialogue systems in the open domain have achieved great success due to large conversation data and the development of deep learning, but multi-turn scenarios are still a challenge because of the frequent coreference and information omission. In this paper, we investigate the incomplete utterance restoration since it has brought general improvement over multi-turn dialogue systems in recent studies. Inspired by the autoregression for generation and the sequence labeling for text editing, we propose a novel semi autoregressive generator (SARG) with the high efficiency and flexibility. Moreover, experiments on Restoration-200k show that our proposed model significantly outperforms the state-of-the-art models with faster inference speed. 1
This paper describes our solution for Sere-TOD Challenge Track 1: Information extraction from dialog transcripts. We propose a token-pair framework to simultaneously identify entity and value mentions and link them into corresponding triples. As entity mentions are usually coreferent, we adopt a baseline model for coreference resolution. We exploit both annotated transcripts and unsupervised dialogs for training. With model ensemble and post-processing strategies, our system significantly outperforms the baseline solution and ranks first in triple f1 and third in entity f1.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.