7Predicting functions for novel amino acid sequences is a long-standing research problem. The 8 Uniprot database which contains protein sequences annotated with Gene Ontology (GO) terms, 9 is one commonly used training dataset for this problem. Predicting protein functions can then 10 be viewed as a multi-label classification problem where the input is an amino acid sequence and 11 the output is a set of GO terms. Recently, deep convolutional neural network (CNN) models have 12 been introduced to annotate GO terms for protein sequences. However, the CNN architecture 13 can only model close-range interactions between amino acids in a sequence. In this paper, first, 14 we build a novel GO annotation model based on the Transformer neural network. Unlike the 15 CNN architecture, the Transformer models all pairwise interactions for the amino acids within a 16 sequence, and so can capture more relevant information from the sequences. Indeed, we show 17 that our adaptation of Transformer yields higher classification accuracy when compared to the 18 recent CNN-based method DeepGO. Second, we modify our model to take motifs in the protein 19 sequences found by BLAST as additional input features. Our strategy is different from other 20 ensemble approaches that average the outcomes of BLAST-based and machine learning predictors. 21 Third, we integrate into our Transformer the metadata about the protein sequences such as 3D 22 structure and protein-protein interaction (PPI) data. We show that such information can greatly 23 improve the prediction accuracy, especially for rare GO labels. 24 1 Introduction 25 Predicting protein functions is an important task in computational biology. With the cost of 26 sequencing continuing to decrease, the gap between the numbers of labeled and unlabeled 27 sequences continues to grow [18]. Protein functions are described by Gene Ontology (GO) 28 terms [16]. Predicting protein functions is a multi-label classification problem where the 29 input is an amino acid sequence and the output is a set of GO terms. GO terms are 30 organized into a hierarchical tree, where generic terms (e.g. cellular anatomical entity) 31 are parents of specific terms (e.g. perforation plate). Due to this tree structure, if a GO 32 term is assigned to a protein, then all its ancestors are also assigned to this same protein. 33 When analyzing only the amino acid sequence data to predict protein functions, there 34 are two major trends. The first trend relies on string-matching models like Basic Local 35 Alignment Search Tool (BLAST) to match the unknown sequence with labeled proteins 36 in the database [11]. Recently, Zhang et al. [18] combined BLAST with Position-Specific 37 Iterative Basic Local Alignment Search Tool (PSI-BLAST) to retrieve even more labeled 38 1 proteins which are possibly related to the unknown sequence. The key idea behind 39BLAST methods is to retrieve proteins that resemble the unknown sequence; most likely, 40 these retrieved proteins will contain similar evolutionarily co...