2021
DOI: 10.48550/arxiv.2111.04022
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information

Abstract: We study the problem of weakly supervised text classification, which aims to classify text documents into a set of pre-defined categories with category surface names only and without any annotated training document provided. Most existing approaches leverage textual information in each document. However, in many domains, documents are accompanied by various types of metadata (e.g., authors, venue, and year of a research paper). These metadata and their combinations may serve as strong category indicators in ad… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…However, all these studies focus on the fully supervised setting. Some studies [64][65][66][67] leverage metadata in few-shot text classification. Nevertheless, in LMTC, since the label space is large, it becomes prohibitive to provide even a few training samples for each label.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, all these studies focus on the fully supervised setting. Some studies [64][65][66][67] leverage metadata in few-shot text classification. Nevertheless, in LMTC, since the label space is large, it becomes prohibitive to provide even a few training samples for each label.…”
Section: Related Workmentioning
confidence: 99%
“…More generally, metadata also exist in Web content such as e-commerce reviews (e.g., reviewer and product information) [64], social media posts (e.g., users and hashtags) [69], and code repositories (e.g., contributors) [70]. Although metadata have been used in fully supervised [68] or single-label [28,64,66,67] text classification, it is largely unexplored in zero-shot LMTC.…”
Section: Introductionmentioning
confidence: 99%