Much work has recently identified the need to combine deep learning with extreme learning in order to strike a performance balance with accuracy, especially in the domain of multimedia applications. When considering this new paradigm—namely, the convolutional extreme learning machine (CELM)—we present a systematic review that investigates alternative deep learning architectures that use the extreme learning machine (ELM) for faster training to solve problems that are based on image analysis. We detail each of the architectures that are found in the literature along with their application scenarios, benchmark datasets, main results, and advantages, and then present the open challenges for CELM. We followed a well-structured methodology and established relevant research questions that guided our findings. Based on 81 primary studies, we found that object recognition is the most common problem that is solved by CELM, and CCN with predefined kernels is the most common CELM architecture proposed in the literature. The results from experiments show that CELM models present good precision, convergence, and computational performance, and they are able to decrease the total processing time that is required by the learning process. The results presented in this systematic review are expected to contribute to the research area of CELM, providing a good starting point for dealing with some of the current problems in the analysis of computer vision based on images.