Learning Better Internal Structure of Words for Sequence Labeling

Xin, Yingwei; Hart, Ethan; Mahajan, Vibhuti; Ruvini, Jean David

doi:10.18653/v1/d18-1279

Cited by 37 publications

(36 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Model Accuracy Plank et al (2016) 97.22 Huang et al (2015) 97.55 Ma and Hovy (2016) 97.55 97.53 97.51 Zhang et al (2018c) 97.55 Yasunaga et al (2018) 97.58 Xin et al (2018) 97.58 Transformer-softmax (Guo et al, 2019) 97.04 BiLSTM-softmax 97.51 BiLSTM-CRF 97.51 BiLSTM-LAN 97.65 As can be seen, a multi-layer model with larger hidden sizes does not give significantly better results compared to a 1-layer model with a hidden size of 400. We thus chose the latter for the final model.…”

Section: Development Experimentsmentioning

confidence: 99%

Hierarchically-Refined Label Attention Network for Sequence Labeling

Cui¹,

Zhang²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

CRF has been used as a powerful model for statistical sequence labeling. For neural sequence labeling, however, BiLSTM-CRF does not always lead to better results compared with BiLSTM-softmax local classification. This can be because the simple Markov label transition model of CRF does not give much information gain over strong neural encoding. For better representing label sequences, we investigate a hierarchically-refined label attention network, which explicitly leverages label embeddings and captures potential long-term label dependency by giving each word incrementally refined label distributions with hierarchical attention. Results on POS tagging, NER and CCG supertagging show that the proposed model not only improves the overall tagging accuracy with similar number of parameters, but also significantly speeds up the training and testing compared to BiLSTM-CRF.

show abstract

Section: Development Experimentsmentioning

confidence: 99%

Hierarchically-Refined Label Attention Network for Sequence Labeling

Cui¹,

Zhang²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

show abstract

“…We set batch size to 4096 at the token level, transition number to 4, hidden size of sequence labeling encoder and decoder to 256, hidden size of global contextual encoder to 128. (Zhang et al, 2018) 91.57 (Yang et al, 2017a) 91.62 (Chiu and Nichols, 2016)* † 91.62 ± 0.33 (Xin et al, 2018) 91.64 ± 0.17 GCDT 91.96 ± 0.04 GCDT + BERT LARGE 93.47 ± 0.03 (Yang et al, 2017b) 94.66 (Zhai et al, 2017) 94.72 (Hashimoto et al, 2017) 95.02 (Søgaard and Goldberg, 2016) 95.28 (Xin et al, 2018) 95.29 ± 0.08 GCDT 95.43 ± 0.06 GCDT + BERT LARGE 97.30 ± 0.03…”

Section: Implementation Detailsmentioning

confidence: 99%

“…We set batch size to 4096 at the token level, transition number to 4, hidden size of sequence labeling encoder and decoder to 256, hidden size of global contextual encoder to 128. (Yang et al, 2017b) 94.66 (Zhai et al, 2017) 94.72 (Hashimoto et al, 2017) 95.02 (Søgaard and Goldberg, 2016) 95.28 (Xin et al, 2018) 95.29 ± 0.08 GCDT 95.43 ± 0.06 GCDT + BERT LARGE 97.30 ± 0.03 Table 2: F 1 scores on CoNLL2000 Chunking task. * refers to adopting external task-specific resources (like Gazetteers or annotated data).…”

Section: Implementation Detailsmentioning

confidence: 99%

GCDT: A Global Context Enhanced Deep Transition Architecture for Sequence Labeling

Liu

Meng

Zhang

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Current state-of-the-art systems for the sequence labeling tasks are typically based on the family of Recurrent Neural Networks (RNNs). However, the shallow connections between consecutive hidden states of RNNs and insufficient modeling of global information restrict the potential performance of those models. In this paper, we try to address these issues, and thus propose a Global Context enhanced Deep Transition architecture for sequence labeling named GCDT. We deepen the state transition path at each position in a sentence, and further assign every token with a global representation learned from the entire sentence. Experiments on two standard sequence labeling tasks show that, given only training data and the ubiquitous word embeddings (Glove), our GCDT achieves 91.96 F 1 on the CoNLL03 NER task and 95.43 F 1 on the CoNLL2000 Chunking task, which outperforms the best reported results under the same settings. Furthermore, by leveraging BERT as an additional resource, we establish new stateof-the-art results with 93.47 F 1 on NER and 97.30 F 1 on Chunking 1 .

show abstract

“…After architecture search, we test the transferability of the learned architecture. In order to apply the model to other tasks, we directly use the architecture searched on WikiText-103 and train the param-Models F1 Cross-BiLSTM-CNN (Aguilar et al, 2018) (Yang and Zhang, 2018) 95.06 BiLSTM-CRF + IntNet (Xin et al, 2018) 95.29 Flair (Akbik et al, 2019) 96.72 GCDT + BERTLARGE (Liu et al, 2019b) 97 eters with the in-domain data. In our experiments, we adapt the model to CoNLL-2003, WNUT-2017NER tasks and CoNLL-2000 For the two NER tasks, it achieves new stateof-the-art F1 scores (Table 4 and Table 5).…”

Section: Transferring To Other Tasksmentioning

confidence: 99%

Learning Architectures from an Extended Search Space for Language Modeling

Hu²,

Zhang

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Neural architecture search (NAS) has advanced significantly in recent years but most NAS systems restrict search to learning architectures of a recurrent or convolutional cell. In this paper, we extend the search space of NAS. In particular, we present a general approach to learn both intra-cell and inter-cell architectures (call it ESS). For a better search result, we design a joint learning method to perform intra-cell and inter-cell NAS simultaneously. We implement our model in a differentiable architecture search system. For recurrent neural language modeling, it outperforms a strong baseline significantly on the PTB and Wiki-Text data, with a new state-of-the-art on PTB. Moreover, the learned architectures show good transferability to other systems. E.g., they improve state-of-the-art systems on the CoNLL and WNUT named entity recognition (NER) tasks and CoNLL chunking task, indicating a promising line of research on large-scale prelearned architectures.

show abstract

Learning Better Internal Structure of Words for Sequence Labeling

Cited by 37 publications

References 25 publications

Hierarchically-Refined Label Attention Network for Sequence Labeling

Hierarchically-Refined Label Attention Network for Sequence Labeling

GCDT: A Global Context Enhanced Deep Transition Architecture for Sequence Labeling

Learning Architectures from an Extended Search Space for Language Modeling

Contact Info

Product

Resources

About