Bidirectional Distillation for Top-K Recommender System

Kweon, Wonbin; Kang, SeongKu; Yu, Hwanjo

doi:10.1145/3442381.3449878

Cited by 31 publications

(25 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(1) KD by the predictions. Motivated by [5] that matches the class distributions, most existing methods [8,10,12,22,26] have focused on matching the predictions (i.e., recommendation results) from the teacher and the student. The teacher's predictions convey additional information about the subtle difference among the items, helping the student generalize better than directly learning from binary labels [12].…”

Section: Related Workmentioning

confidence: 99%

“…Since a user is interested in only a few items, distilling knowledge of a few top-ranked items is effective to discover the user's preferable items [22]. Most recently, [10] utilizes rankdiscrepancy information between the predictions from the teacher and the student. Specifically, [10] focuses on distilling the knowledge of the items ranked highly by the teacher but ranked lowly by the student.…”

Section: Related Workmentioning

confidence: 99%

“…Most recently, [10] utilizes rankdiscrepancy information between the predictions from the teacher and the student. Specifically, [10] focuses on distilling the knowledge of the items ranked highly by the teacher but ranked lowly by the student. On the one hand, [8,26] focus on distilling ranking order information from the teacher's predictions.…”

Section: Related Workmentioning

confidence: 99%

“…To tackle this problem, Knowledge Distillation (KD) has been adopted to RS [8,10,12,22,26,31]. KD is a model-independent strategy to improve the performance of a compact model (i.e., student) by transferring the knowledge from a pre-trained large model (i.e., teacher).…”

Section: Introductionmentioning

confidence: 99%

“…Most existing KD methods for RS transfer the knowledge from the teacher's predictions [8,10,12,22,26] (Figure 1a). They basically enforce the student to imitate the teacher's recommendation results, providing guidance to the predictions of the student.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Topology Distillation for Recommender System

Kang

Hwang

Kweon

et al. 2021

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &Amp; Data Mining

Self Cite

View full text Add to dashboard Cite

Recommender Systems (RS) have employed knowledge distillation which is a model compression technique training a compact student model with the knowledge transferred from a pre-trained large teacher model. Recent work has shown that transferring knowledge from the teacher's intermediate layer significantly improves the recommendation quality of the student. However, they transfer the knowledge of individual representation point-wise and thus have a limitation in that primary information of RS lies in the relations in the representation space. This paper proposes a new topology distillation approach that guides the student by transferring the topological structure built upon the relations in the teacher space. We first observe that simply making the student learn the whole topological structure is not always effective and even degrades the student's performance. We demonstrate that because the capacity of the student is highly limited compared to that of the teacher, learning the whole topological structure is daunting for the student. To address this issue, we propose a novel method named Hierarchical Topology Distillation (HTD) which distills the topology hierarchically to cope with the large capacity gap. Our extensive experiments on real-world datasets show that the proposed method significantly outperforms the state-of-the-art competitors. We also provide in-depth analyses to ascertain the benefit of distilling the topology for RS. CCS CONCEPTS• Information systems → Learning to rank; Collaborative filtering; Retrieval efficiency.

show abstract