“…The following regression algorithms were used: linear regression, ridge regression, lasso, elastic net, random forest, support vector machines with nested cross-validation, and k-nearest neighbour regression 81 . To include information from the regulatory DNA sequences in the shallow models, k-mers of lengths 4 to 6 bp were extracted from the regulatory DNA sequences 82 Table S1-6), which included inception layers 84 (ii) 1 to 2 bidirectional recurrent neural network (RNN) layers 85 , and (iii) 1 to 2 fully connected (FC) layers, in a global architecture layout CNN-RNN-FC 30,[86][87][88] . Training the networks both (i) concurrently or (ii) consecutively, by weight transfer on different variables (regulatory sequences to CNN and RNN, numeric variables to FC), showed that the architecture yielding best results was a concurrently trained CNN (3 layers)-FC (2 layers) 12,89-91 , which was used for all models.…”