“…After pre-processing, the total number of words for the emails_v1 dataset, emails_v2 dataset, 20 newsgroups dataset, and routers dataset was 1,014, 465, 2,591, and 412 respectively. After that, four term-weighting schemes were applied to the words in the BOW: term frequency (TF), term presence (TP), term frequency and inverse document frequency (TF-IDF), [36] and term presence and class-specific document frequency (TP-CSDF), [37] to generate numerical features. The other three datasets are numerical.…”