“…To capture the long-term temporal dependencies, we insert cascaded temporal convolutional modules (TCMs) [10] in the bottleneck. To decrease the parameters, we opt to the squeezed version [15,16], i.e., S-TCM, where the feature size is first compressed into 64 rather than 512 as the literature stated [10], followed by dilated convolutions. For each stage, we stack three groups of TCMs, each of which includes 6 S-TCMs with dilation rate d = {1, 2, 4, 8, 16, 32}.…”