“…For Classify, most works measure MGR performance by classification accuracy (the ratio of "correct" predictions to all observations) computed from k-fold stratified cross-validation (kfCV), e.g., 2fCV (4 papers) [7,22,23,56], 3fCV (3 papers) [18,71,74], 5fCV (6 papers) [3,13,30,31,53,100], and 10fCV (55 papers) [2,5,9,11,14,16,17,24-26,28,29,34,35,37,39-42, 44,47-51,57,58,60-64,66-68,70,72,73,75,76,78,79,82-85,88-91,94-96,98,99]. Most of these use a single run of cross-validation; however, some perform multiple runs, e.g., 10 independent runs of 2fCV (10x2CV) [56] or 20x2fCV [22,23], 10x3fCV [71,74], and 10x10fCV [37,70,72,75,[83][84][85]. In one experiment, Li and Sleep [42] use 10fCV with random partitions; but in another, they partition the excerpts into folds based on their file number -roughly implementing an artist filter.…”