“…In [39], it was proved that the last layer of any fully-connected network is identical to that of a deep CNN with at most 8 times number of free parameters. For approximating or learning ridge function [10], radial functions [23], and functions from Korobov spaces [24], deep CNNs can be achieve the same accuracy with much smaller number of free parameters than fullyconnected networks. In a recent application of CNNs to readability of Chinese texts [11], it is found that one layer or two is already efficient.…”