“…We use the notation h = [h 1 , h 2 , ..., h L ] to represent a network with L hidden layers, where h l denotes the number of neurons in the fully connected layer l, or the number of kernels in the convolutional layer l. In recent works that apply DNNs to decode ECCs, the training set explodes rapidly as the source word length grows. For example, with a rate 0.5 (n = 1024, k = 512) ECC, one epoch consists of 2 512 possibilities of codewords of length 1024, which results in very large complexity and makes it difficult to train and implement DNN-based decoding in practical systems [28], [29], [31], [32]. However, we note that in FL CS decoding, this problem does not exist since CS source words are typically considerably shorter, possibly only up to a few dozen symbols [1], [6]- [17].…”