2021 IEEE 37th International Conference on Data Engineering (ICDE) 2021
DOI: 10.1109/icde51399.2021.00291
|View full text |Cite
|
Sign up to set email alerts
|

Improving Conversational Recommender System by Pretraining Billion-scale Knowledge Graph

Abstract: High-quality, large-scale corpora are the cornerstone of building foundation models. In this work, we introduce MATHPILE, a diverse and high-quality math-centric corpus comprising about 9.5 billion tokens. Throughout its creation, we adhered to the principle of "less is more", firmly believing in the supremacy of data quality over quantity, even in the pretraining phase. Our meticulous data collection and processing efforts included a complex suite of preprocessing, prefiltering, language identification, clean… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
2

Relationship

1
9

Authors

Journals

citations
Cited by 30 publications
(7 citation statements)
references
References 21 publications
0
7
0
Order By: Relevance
“…The DIN [21], DIEN [10], TIEN [22], and MARN [8] approaches use sequential activity information, which we also rely on in our approach. CTR approaches are evaluated on different datasets, some publications and approaches rely only on closed data [13,18,[23][24][25] which are not included in Table 1. Others, as shown in Table 1, use openly available datasets to evaluate their approach.…”
Section: Approaching Click-through Rate Predictionmentioning
confidence: 99%
“…The DIN [21], DIEN [10], TIEN [22], and MARN [8] approaches use sequential activity information, which we also rely on in our approach. CTR approaches are evaluated on different datasets, some publications and approaches rely only on closed data [13,18,[23][24][25] which are not included in Table 1. Others, as shown in Table 1, use openly available datasets to evaluate their approach.…”
Section: Approaching Click-through Rate Predictionmentioning
confidence: 99%
“…Pre-training techniques are widely used in feature-based recommendation systems to enhance user or item representations [46]. Qiu et al [24] propose review encoder pre-training to complement user representations, while Wong et al [39] utilize pre-training on a large-scale knowledge graph for conversational recommender systems. However, these methods are not applicable to medication recommendation.…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, it can use side information to enhance the recommendation effectiveness. KG has recently been employed as auxiliary data in a variety of recommendation tasks [1], [2], as well as for news RS [3], [4]. KGs are also widely used in information retrieval [5], digital assistants [6], fraud detection [7], [8], chatbots [9], and questioning and answering(QA) systems [10].…”
Section: Introductionmentioning
confidence: 99%