“…We evaluate the performance of proposed models on four datasets: NYTimes 2 (NYT), Grolier 3 (GRL), DBpedia ontology classification dataset (DBP) (Zhang et al, 2015) and 20 Newsgroups 4 (20NG). For NYTimes and Grolier datasets, we use the processed version of (Wang et al, 2019a). For the DBpedia dataset, we first sample 100, 000 documents from the whole training set, and then perform preprocessing including tokenization, lemmatization, removal of stopwords, and low-frequency words.…”