The functions of an organism and its biological processes derive from the expression and activity of genes and proteins. Therefore quantifying and predicting gene and protein expression values is a crucial aspect of scientific research. Concerning the prediction of gene expression values, the available machine learning-based approaches use the gene sequence %with the succession of nitrogenous bases as inputs to the neural network models. Some techniques, including Xpresso and Basenjii, have been proposed to predict gene expression values in the samples. However, these architectures are mainly based on Convolutional or Long Short Term Memory Networks, neglecting the attention mechanisms impact in selecting the sequence's relevant portions for prediction purposes. In addition, as far as we know, there is no model for predicting protein expression values exploiting the sequence of genes (sequence of nitrogenous bases) or proteins (sequence of amino acids).
Here, we present a new Perceiver-type model, which exploits a transformer-based architecture to profit from the attention module and overcome the quadratic complexity of the standard attention-based architectures. The contributions of this work are the following: 1. Development of the DNAperceiver model for improving the prediction of gene expression starting from the genes' sequence; 2. Development of a ProteinPerceiver model for predicting protein expression values starting from the amino acid sequence; 3. Development of a Protein\&DNAPerceiver model for predicting protein expression values by simultaneously combining the sequence of nitrogenous bases and amino acids. 4. Evaluation of models under multiple conditions: predicting values on cell lines, mice, and tumor tissues of glioblastoma and lung cancer.
The results show the effectiveness of the Perceiver-type models in predicting gene and protein expression values starting from the gene/protein sequence. In this context, inserting further regulatory and epigenetic information into the model could improve the prediction task.