Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1279
|View full text |Cite
|
Sign up to set email alerts
|

Learning Better Internal Structure of Words for Sequence Labeling

Abstract: Character-based neural models have recently proven very useful for many NLP tasks. However, there is a gap of sophistication between methods for learning representations of sentences and words. While most character models for learning representations of sentences are deep and complex, models for learning representations of words are shallow and simple. Also, in spite of considerable research on learning character embeddings, it is still not clear which kind of architecture is the best for capturing character-t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
36
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 37 publications
(36 citation statements)
references
References 25 publications
0
36
0
Order By: Relevance
“…Model Accuracy Plank et al (2016) 97.22 Huang et al (2015) 97.55 Ma and Hovy (2016) 97.55 97.53 97.51 Zhang et al (2018c) 97.55 Yasunaga et al (2018) 97.58 Xin et al (2018) 97.58 Transformer-softmax (Guo et al, 2019) 97.04 BiLSTM-softmax 97.51 BiLSTM-CRF 97.51 BiLSTM-LAN 97.65 As can be seen, a multi-layer model with larger hidden sizes does not give significantly better results compared to a 1-layer model with a hidden size of 400. We thus chose the latter for the final model.…”
Section: Development Experimentsmentioning
confidence: 99%
“…Model Accuracy Plank et al (2016) 97.22 Huang et al (2015) 97.55 Ma and Hovy (2016) 97.55 97.53 97.51 Zhang et al (2018c) 97.55 Yasunaga et al (2018) 97.58 Xin et al (2018) 97.58 Transformer-softmax (Guo et al, 2019) 97.04 BiLSTM-softmax 97.51 BiLSTM-CRF 97.51 BiLSTM-LAN 97.65 As can be seen, a multi-layer model with larger hidden sizes does not give significantly better results compared to a 1-layer model with a hidden size of 400. We thus chose the latter for the final model.…”
Section: Development Experimentsmentioning
confidence: 99%
“…We set batch size to 4096 at the token level, transition number to 4, hidden size of sequence labeling encoder and decoder to 256, hidden size of global contextual encoder to 128. (Zhang et al, 2018) 91.57 (Yang et al, 2017a) 91.62 (Chiu and Nichols, 2016)* † 91.62 ± 0.33 (Xin et al, 2018) 91.64 ± 0.17 GCDT 91.96 ± 0.04 GCDT + BERT LARGE 93.47 ± 0.03 (Yang et al, 2017b) 94.66 (Zhai et al, 2017) 94.72 (Hashimoto et al, 2017) 95.02 (Søgaard and Goldberg, 2016) 95.28 (Xin et al, 2018) 95.29 ± 0.08 GCDT 95.43 ± 0.06 GCDT + BERT LARGE 97.30 ± 0.03…”
Section: Implementation Detailsmentioning
confidence: 99%
“…We set batch size to 4096 at the token level, transition number to 4, hidden size of sequence labeling encoder and decoder to 256, hidden size of global contextual encoder to 128. (Yang et al, 2017b) 94.66 (Zhai et al, 2017) 94.72 (Hashimoto et al, 2017) 95.02 (Søgaard and Goldberg, 2016) 95.28 (Xin et al, 2018) 95.29 ± 0.08 GCDT 95.43 ± 0.06 GCDT + BERT LARGE 97.30 ± 0.03 Table 2: F 1 scores on CoNLL2000 Chunking task. * refers to adopting external task-specific resources (like Gazetteers or annotated data).…”
Section: Implementation Detailsmentioning
confidence: 99%
“…After architecture search, we test the transferability of the learned architecture. In order to apply the model to other tasks, we directly use the architecture searched on WikiText-103 and train the param-Models F1 Cross-BiLSTM-CNN (Aguilar et al, 2018) (Yang and Zhang, 2018) 95.06 BiLSTM-CRF + IntNet (Xin et al, 2018) 95.29 Flair (Akbik et al, 2019) 96.72 GCDT + BERTLARGE (Liu et al, 2019b) 97 eters with the in-domain data. In our experiments, we adapt the model to CoNLL-2003, WNUT-2017NER tasks and CoNLL-2000 For the two NER tasks, it achieves new stateof-the-art F1 scores (Table 4 and Table 5).…”
Section: Transferring To Other Tasksmentioning
confidence: 99%