“…When the sigmoid activation function is used, the "vanishing gradient problem" can occur, which means that the gradient becomes vanishingly small with recurrent multiplication to compute the gradients of the other layers as the sigmoid function has gradients in the range of 0 to 1. The CNN model consisted of a convolution layer, max pooling layer, and fully connected layer [50]. The size of the CNN filter was 2×128, 3×128, 4×128, and 5×128 in the convolution layer of this study, which means that the filters identify the characteristics of two consecutive letters, three consecutive letters, four consecutive letters, and five consecutive letters, respectively.…”