We introduce the Hölder width, which measures the best error performance of some recent nonlinear approximation methods, such as deep neural network approximation. Then, we investigate the relationship between Hölder widths and other widths, showing that some Hölder widths are essentially smaller than n-Kolmogorov widths and linear widths. We also prove that, as the Hölder constants grow with n, the Hölder widths are much smaller than the entropy numbers. The fact that Hölder widths are smaller than the known widths implies that the nonlinear approximation represented by deep neural networks can provide a better approximation order than other existing approximation methods, such as adaptive finite elements and n-term wavelet approximation. In particular, we show that Hölder widths for Sobolev and Besov classes, induced by deep neural networks, are O(n−2s/d) and are much smaller than other known widths and entropy numbers, which are O(n−s/d).