Malware detection in current times is increasingly important due to the presence of dangerous malicious software (malware) as well as ransomware in digital cyberspace. Conventional approaches such as using malware features (either static or dynamic or hybrid) have been applied for detection. Advances in Deep Learning (DL) has attracted a lot of interests in applications of malware detection. In particular, the file binaries are fed in to the DL neural network for training and testing. Despite the theoretical basis, we find that overfitting has occurred despite applying precaution, such as applying sufficient dropout layers. Limitations can also be attributed to the final classification layers (fully connected shallow network and softmax classification). In this paper, we apply transfer learning using ShuffleNet and DenseNet-201, two models trained on large dataset to recognize daily objects. This is done with the entire layers frozen to prevent overfitting and an optimal Error correction output code (ECOC) ensemble configuration of Support Vector Machines (SVM). Several ECOC coding matrices were applied, e.g., One vs. All (OVA), One vs. One (OVO), Dense Random (DR), and Sparse Random (SR). Each of these configurations represents varying complexity and ensemble size and, hence, a tradeoff between computation reduction and complex non-linear separation appears. Given that the continuous values of SVM parameters may take up high computation for acquiring the optimal parameter configuration, we apply discrete values combination using a grid search approach for parameter optimization. We test the proposed model on Malimg, MaleVis, virus-MNIST, and Dumpware10 datasets. The results show better/comparable accuracy compared with the existing work. The best/average accuracy values for each dataset over 10 trials are: Malimg (99.14%/98.87%), MaleVis (95.01%/93.91%), Virus-MNIST (86.36%/85.79%), Dumpware10 (96.62%/95.79%).