Large-Context Conversational Representation Learning: Self-Supervised Learning For Conversational Documents

Masumura, Ryo; Makishima, Naoki; Ihori, Mana; Takashima, Akihiko; Tanaka, Tomohiro; Orihashi, Shota

doi:10.1109/slt48900.2021.9383584

Cited by 2 publications

(8 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Utterance-level dialogue sequence labeling is being used for topic segmentation [7][8][9], dialogue act estimation [10][11][12][13][14][15], and call scene segmentation [16][17][18]. Hierarchically structured models consisting of utterance-level and dialogue-level neural networks are often used to efficiently capture contexts within an utterance and between utterances, and an effective self-supervised pretraining method has been proposed [18]. If a hierarchical model is used for dialogue sequence labeling, a large number of parameters are needed to train a model that offers high accuracy.…”

Section: Utterance-level Dialogue Sequence Labelingmentioning

confidence: 99%

“…When self-supervised pretraining [18] is utilized, parameters {θ w , θ r , θ s , θ u } are initialized by pretraining using unlabeled data, and then parameters Θ are optimized with L HT in the same way as above.…”

Section: Utterance-level Dialogue Sequence Labelingmentioning

confidence: 99%

“…In this paper, we focus on utterance-level dialogue sequence labeling, a key component in dialogue document understanding. Dialogue sequence labeling is often modeled as a supervised learning task that estimates labels for each utterance when given a dialogue document; it is useful in many applications such as topic segmentation [7][8][9], dialogue act estimation [10][11][12][13][14][15], and call scene segmentation [16][17][18]. To understand dialogue documents, it is necessary to consider who spoke what and in what order.…”

Section: Introductionmentioning

confidence: 99%

“…Therefore, these techniques often adopt a hierarchically-structured model consisting of an utterance-level network and a dialogue-level network to capture contexts not only within an utterance but also between utterances [16]. In addition, an effective selfsupervised pretraining method using only unlabeled data has been proposed [18].…”

Section: Introductionmentioning

confidence: 99%

“…• We conduct ablation experiments on dialogue act estimation and call scene segmentation tasks that analyze the effectiveness of the proposed method. We also provide the results achieved by combining self-supervised pretraining [18] and the proposed method.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Hierarchical Knowledge Distillation for Dialogue Sequence Labeling

Orihashi¹,

Yamazaki²,

Makishima³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

This paper presents a novel knowledge distillation method for dialogue sequence labeling. Dialogue sequence labeling is a supervised learning task that estimates labels for each utterance in the target dialogue document, and is useful for many applications such as dialogue act estimation. Accurate labeling is often realized by a hierarchically-structured large model consisting of utterance-level and dialogue-level networks that capture the contexts within an utterance and between utterances, respectively. However, due to its large model size, such a model cannot be deployed on resource-constrained devices.To overcome this difficulty, we focus on knowledge distillation which trains a small model by distilling the knowledge of a large and high performance teacher model. Our key idea is to distill the knowledge while keeping the complex contexts captured by the teacher model. To this end, the proposed method, hierarchical knowledge distillation, trains the small model by distilling not only the probability distribution of the label classification, but also the knowledge of utterance-level and dialogue-level contexts trained in the teacher model by training the model to mimic the teacher model's output in each level. Experiments on dialogue act estimation and call scene segmentation demonstrate the effectiveness of the proposed method.

show abstract