“…SAPK4) often with dashes in different places and various misspellings. In principle, NLP systems should be able to overcome these inconsistencies via robust grounding algorithms, but we find that misspellings, errors in residue numbering, use of mouse names for human proteins and vice versa, and use of ambiguous acronyms remain a substantial barrier to assembly of systematic knowledge about kinases (Bachman et al, 2019; Steppi et al, 2020) and presumably other classes of proteins as well. Moreover, in the scientific literature, findings about kinases are often described at the level of protein families (e.g., MEK, AKT, ERK) or complexes (e.g., mTORC1, PI3K) rather than one or more of their specific protein members (Bachman et al, 2018).…”