Recently, deep learning classifiers have proven even more robust in pattern recognition and classification than have texture analysis techniques. With the broad availability of relatively inexpensive Graphics Processing Units (GPUs), many researchers have begun applying deep learning techniques to visual representations of acoustic traces. Preselected or handcrafted descriptors, such as LBP, are not necessary for deep learners since they learn salient features during the training phase. Deep learners, moreover, are uniquely suited to handling visual representations of audio because many of the most famous deep classifiers, such as Convolutional Neural Networks (CNN), require matrices as their input. Humphrey and Bello [17, 18] were among the first to apply CNNs to audio images for music classification and, as a result, succeeded in redefining the state of the art in automatic chord detection and recognition. In the same year, Nakashika et al. [19] reported converting spectrograms to GCLM maps to train CNNs to performed music genre classification on the GTZAN dataset [20]. Later, Costa et al. [21] fused a CNN with the traditional pattern recognition framework of training SVMs on LBP features to classify the LMD dataset. These works exceeded traditional classification results on these genre datasets. Up to this point, most work in audio classification has applied the latest advances in machine learning to the problem of sound classification and recognition without modifying the classification process to make it singularly suitable for sound recognition. An early exception to the generic approach is found in the work of Sigtia and Dixon [22], who adjusted CNN parameters and structures in such a way as to reduce the time it took to train a set of audio images. Time reduction was accomplished by replacing