“…For example, the second-best architecture [34] was based on Convolutional Neural Networks (CNNs), BiLSTMs, CRF and Multi-head self-attention, employing features such as part-of-speech tagging, ELMo embeddings [41], and Word2Vec embeddings [42]. Sarabadani [43] also used LSTMs and CNNs, combined with ELMo embeddings and three specialized lexicon sets, while Lopez et al [44] used a CRF with GloVe embeddings [45]. The other half of the proposed models were all based on the recently-introduced BERT and its variants, including the best architecture for 2019 [33], which employed an ensemble of BioBERTs with a CRF module.…”