“…Weak supervision has been studied for building document classifiers in various of forms, including hundreds of labeled training documents (Tang et al, 2015;Miyato et al, 2016;Xu et al, 2017), class/category names (Song and Roth, 2014;Tao et al, 2015;Li et al, 2018), and user-provided seed words Tao et al, 2015). In this paper, we focus on user-provided seed words as the source of weak supervision, Along this line, Doc2Cube (Tao et al, 2015) expands label keywords from label surface names and performs multidimensional document classification by learning dimension-aware embedding; PTE (Tang et al, 2015) utilizes both labeled and unlabeled documents to learn text embeddings specifically for a task, which are later fed to logistic regression classifiers for classification; leverage seed information to generate pseudo documents and introduces a self-training module that bootstraps on real unlabeled data for model refining.…”