“…For all models, we optimized the learning rate (lr), lr = [5x10 -4 , 5x10 -5 , or 5x10 -6 ]. The following hyperparameters were optimized: (a) GCN, hidden atom features (h a ), number of convolutional layers (n c ), hidden multiset transformer nodes (h t ), hidden predictor features (h p ), , h a = [32,64,128,256,512], n c = [1,2,3,4,5] [32,64,128,256]. All models were trained for 300 epochs, using early-stopping with a patience of 10 epochs.…”