2022
DOI: 10.48550/arxiv.2202.09852
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Cross-Task Knowledge Distillation in Multi-Task Recommendation

Abstract: Multi-task learning has been widely used in real-world recommenders to predict different types of user feedback. Most prior works focus on designing network architectures for bottom layers as a means to share the knowledge about input features representations. However, since they adopt task-specific binary labels as supervised signals for training, the knowledge about how to accurately rank items is not fully shared across tasks. In this paper, we aim to enhance knowledge transfer for multitask personalized re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 19 publications
0
2
0
Order By: Relevance
“…To mitigate the discrepancy introduced by the Sparse Relation Challenge in real social graph, we design a novel calibrator inspired by Masked Language Model (MLM) [25] in the community of natural language model. Meanwhile, with the light of distillation techniques [26], [27], we utilize the well-trained f (•) from Section III-A which is trained on the subset U of users with all relation types as the teacher model, and then train a student model k(•) which is more robust on the social graph with sparse relations. Particularly, the input embedding X u of teacher model is polluted as Xu for the student model, where a small fraction of its relations in Eq.…”
Section: A Cluster-calibrator-merge Modulementioning
confidence: 99%
“…To mitigate the discrepancy introduced by the Sparse Relation Challenge in real social graph, we design a novel calibrator inspired by Masked Language Model (MLM) [25] in the community of natural language model. Meanwhile, with the light of distillation techniques [26], [27], we utilize the well-trained f (•) from Section III-A which is trained on the subset U of users with all relation types as the teacher model, and then train a student model k(•) which is more robust on the social graph with sparse relations. Particularly, the input embedding X u of teacher model is polluted as Xu for the student model, where a small fraction of its relations in Eq.…”
Section: A Cluster-calibrator-merge Modulementioning
confidence: 99%
“…In [29], the authors propose a cross-task knowledge distillation framework consisting of three modules: 1) task augmentation: ranking fine-grained cross-tasks through an auxiliary loss. 2) knowledge distillation: sharing ranked knowledge representation across tasks to enforce consistency.…”
Section: Multi-task Learningmentioning
confidence: 99%