“…Some notable contributions include AlphaFold2, RoseTTAFold, ESMFold, OmegaFold, and EMBER2, which have successfully estimated amino acid sequence-to-structure mapping [7, 8, 9, 10, 11]. More generalized models such as ProtBERT, ProtT5, Ankh, and xTrimoPGLM offer highly effective contextualized sequence representations that map intuitively to protein function, gene ontology, physiochemical properties, and more [12, 13, 14]. Interestingly, some pLM projects have opted for different vocabularies outside of the traditional single-letter amino acid code.…”