“…Representing words or relations with continuous vectors (Mikolov et al, 2013;Ji and Eisenstein, 2014) embeds semantics in the same space, which benefits alleviating the data sparseness problem and enables end-to-end and multi-task learning. Recurrent neural networks (RNNs) (Graves, 2012) and the variants like Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) and Gated Recurrent (GRU) (Cho et al, 2014) neural networks show good performance for capturing long distance dependencies on tasks like Named Entity Recognition (NER) (Chiu and Nichols, 2016;Ma and Hovy, 2016), dependency parsing (Dyer et al, 2015) and semantic composition of documents (Tang et al, 2015). This work describes a hierarchical neural architecture with multiple label outputs for modeling the discourse mode sequence of sentences.…”