“…Critical to its success is the simultaneous use of a diversity of semantics-preserving, domain-specific augmentation methods [9,18]. For audio modeling, proven augmentation strategies include sampling nearby audio frames [14,15,17,18,19], artificial example mixing [18,19,14], time/frequency masking [18,19], random resized cropping [18] and time/frequency shifts [14,18,19]. In most cases, these augmentations introduce artificial, handcrafted transformations with hyperparameters that must be tuned to lie within a semantics-preserving range.…”