“…For the SVM (The multi-class SVM strategy applied was one-against all configured with nested stratified cross-validation within the training set) and Logistic Regression, we varied both the norm used in the penalization (l1,l2) and the penalty parameter C (0.10, 0.1, 10, 25). For BERT, we used the tunning suggested by the authors of the method: batch size (16,32), learning rate with Adam (5e-5, 3e-5, 2e-5), and the number of epochs (3,4,5). For the CNN, we varied optimizer learning rate (0.01, 0.001, 0.0001), activation function (relu, linear), optimizer (SGD, Adam and RMSprop), strides (1,2), kernel size (3,4,5), regularization (l1,l2 and l1l2) and pooling type (max and average pooling).…”