Although many autofocus (AF) algorithms have been proposed and compared in the literature [1], [2], none of the existing algorithms can work perfectly for images of a scanning electron microscope (SEM) in practice. This is because a simple mathematical scalar metric cannot perfectly capture the quality of images, especially for a variety of SEM samples, hardware specifications, measurement environment, and etc. In addition, a simple scalar AF metric cannot govern a variety of control parameters such as sharpness, contrast, and brightness. At the era of the 4th industrial revolution, the ultimate goal would be a fully autonomous machine controlling the SEM parameters to get the best image just as a SEM specialist does. To take the first step to the automatic machine-controlled SEM development, we propose a supervised learning framework that automatically assesses the quality of sample images as if a SEM specialist does. Specifically, we develop a deep learning computer software that uses an input of a sample image and current control parameters such as brightness, contrast, and focus to automatically score the quality of the sample image. To evaluate how accurately the proposed deep learning software can score, we define a mean squared error loss (MSE) L as 1 2 1 , where is the score that a SEM specialist gives and is the score predicted by the proposed software. The following neural network architectures are used for deep learning: i) parallelized convolutional neural network (CNN) and fully-connected neural network (FCNN) (PACF) as in Fig. 1, ii) VGG [3], and iii) ResNet [4]. Gray-scale images with a sample type of a grid or tin ball are used. The original image resolution is 640x480-pixel; however, we augment the amount of data by cropping the images to 224x224 pixels, flipping them vertically, and rotating them randomly. As a result, the total number of images is 2134. For supervised learning, the deep learning network should be trained with known inputs and outputs. Thus, 1493 images out of the 2134 images are used for training, and the remaining 641 images are used for testing. Control parameters such as brightness, contrast, and focus of each image are set differently. In addition, each image is scored by a SEM specialist as an overall score on the quality of the image. Experiments are implemented under the following environment: GeForce GTX 1080M, CUDA 10.0, Python 3.6.7, Pytorch 1.0.0, and OpenCV 4.0.1. Table 1 shows the performance comparison for the average test runtime of each image and the Root of MSE (RMSE) between the existing AF algorithms and deep-learning based score prediction networks for 100 epoch and 0.00015 learning rate. For comparison, the values of the AF functions are normalized to scores by . The proposed networks outperform the existing AF functions with respect to both of the test runtime and RMSE. Fig. 2 shows the examples of the scores from the specialist, one of the existing AF algorithms based on absolute variance, and one of the proposed networks, PACF, respectively. Even in the ...