Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining 2022
DOI: 10.1145/3488560.3498384
|View full text |Cite
|
Sign up to set email alerts
|

MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information

Abstract: We study the problem of weakly supervised text classification, which aims to classify text documents into a set of pre-defined categories with category surface names only and without any annotated training document provided. Most existing classifiers leverage textual information in each document. However, in many domains, documents are accompanied by various types of metadata (e.g., authors, venue, and year of a research paper). These metadata and their combinations may serve as strong category indicators in a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1

Relationship

2
7

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 36 publications
0
6
0
Order By: Relevance
“…TopClus may also be extended to perform hierarchical topic discovery, perhaps via top-down clustering in the latent space. Other related tasks like taxonomy construction [30] and weaklysupervised text classification [29,41,42,45,68] may benefit from the coherent and distinctive topics generated by TopClus.…”
Section: Discussionmentioning
confidence: 99%
“…TopClus may also be extended to perform hierarchical topic discovery, perhaps via top-down clustering in the latent space. Other related tasks like taxonomy construction [30] and weaklysupervised text classification [29,41,42,45,68] may benefit from the coherent and distinctive topics generated by TopClus.…”
Section: Discussionmentioning
confidence: 99%
“…Subsequently, topic-model based methods emerged [4,13,14,33,34], which inferred category-aware topics from a limited set of seed words. In the last few years, neural methods has gained prominance [22,23,31,36,39]. They trained neural classifiers using pseudo labels of texts, often relying on generated pseudo-texts or PLMs to detect category-indicative keywords.…”
Section: Related Work 21 Weakly Supervised Text Classificationmentioning
confidence: 99%
“…It is significant and challenging to classify these texts into predefined categories, especially when up-to-date labeled data are hard to access due to the dynamic and open nature of the Web. Consequently, there has been a growing interest in weakly supervised text classification (WSTC) [16,23,31,32,39,40], also known as zero-shot or dataless text classification [3, 4, 13, 14, 22-24, 29, 30, 33, 34, 36, 41], which only requires a limited set of seed words (label names) for each category.…”
Section: Introductionmentioning
confidence: 99%
“…After the t-th iteration, Amazon review belongs to one or more product categories. We use the subset sampled by Zhang et al (2020Zhang et al ( , 2022, which contains 10 categories and 100K reviews. (3) Twitter (Zhang et al, 2017) 6 is a crawl of geo-tagged tweets in New York City from August 2014 to November 2014.…”
Section: Overall Frameworkmentioning
confidence: 99%