MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information

Garg, Shweta; Meng, Yonggang; Chen, Xiusi

doi:10.48550/arxiv.2111.04022

Cited by 1 publication

(2 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, all these studies focus on the fully supervised setting. Some studies [64][65][66][67] leverage metadata in few-shot text classification. Nevertheless, in LMTC, since the label space is large, it becomes prohibitive to provide even a few training samples for each label.…”

Section: Related Workmentioning

confidence: 99%

“…More generally, metadata also exist in Web content such as e-commerce reviews (e.g., reviewer and product information) [64], social media posts (e.g., users and hashtags) [69], and code repositories (e.g., contributors) [70]. Although metadata have been used in fully supervised [68] or single-label [28,64,66,67] text classification, it is largely unexplored in zero-shot LMTC.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification

Shen¹,

Wu²,

Xie³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Large-scale multi-label text classification (LMTC) aims to associate a document with its relevant labels from a large candidate set. Most existing LMTC approaches rely on massive human-annotated training data, which are often costly to obtain and suffer from a long-tailed label distribution (i.e., many labels occur only a few times in the training set). In this paper, we study LMTC under the zero-shot setting, which does not require any annotated documents with labels and only relies on label surface names and descriptions. To train a classifier that calculates the similarity score between a document and a label, we propose a novel metadata-induced contrastive learning (MICoL) method. Different from previous textbased contrastive learning techniques, MICoL exploits document metadata (e.g., authors, venues, and references of research papers), which are widely available on the Web, to derive similar documentdocument pairs. Experimental results on two large-scale datasets show that: (1) MICoL significantly outperforms strong zero-shot text classification and contrastive learning baselines; (2) MICoL is on par with the state-of-the-art supervised metadata-aware LMTC method trained on 10K-200K labeled documents; and (3) MICoL tends to predict more infrequent labels than supervised methods, thus alleviates the deteriorated performance on long-tailed labels. CCS CONCEPTS• Information systems → Data mining; • Computing methodologies → Classification and regression trees.

show abstract

Section: Related Workmentioning

confidence: 99%