“…For very large classification space, extreme multi-label classification is proposed, e.g., a method based on graph embedding (Tagami, 2017 ), a method based on convolutional neural network (CNN) (Liu et al, 2017 ), and a method based on attention model of neural networks (Wang et al, 2018 ). Moreover, label hierarchy also can be considered so that part-of, is-a, and inclusion relationships are extracted from external data sources such as Wikipedia in the classification task (Bairi et al, 2016 ; Xie et al, 2017 ).…”