“…• Sequence, Graph and N-gram based models These types of models first transform the text dataset into sequences of words, the graph of words or N-grams features, later apply different types of deep learning models on those features including CNN (Kim, 2014b), CNN-RNN (Chen et al, 2017), RCNN (Lai et al, 2015), DCNN (Schwenk et al, 2017), XML-CNN (Liu et al, 2017), HR-DGCNN (Peng et al, 2018), Hierarchical LSTM (HLSTM) (Chen et al, 2016), multi-label classification approach based on a conditional cyclic directed graphical model (CDN-SVM) (Guo and Gu, 2011), Hierarchical Attention Network (HAN) (Yang et al, 2016) and Bi-directional Block Self-Attention Network (Bi-BloSAN) (Shen et al, 2018) etc. for the multilabel classification task For example, Hierarchical Attention Networks for Document Classification (HAN) uses a GRU grating mechanism to encode the sequences and apply word and sentence level attention on those sequences for document classification.…”