Fundamental limits of deep neural network learning
This lecture develops the fundamental limits of deep neural network learning from first principle by characterizing what is possible if no constraints on the learning algorithm and on the amount of training data are imposed. Concretely, we consider Kolmogorov-optimal approximation through deep neural networks with the guiding theme being a relation between the complexity of the function (class) to be approximated and the complexity of the approximating network in terms of connectivity and memory requirements for storing the network topology and the associated quantized weights. The theory we develop educes remarkable universality properties of deep networks. Specifically, deep networks are optimal approximants for markedly different function classes such as affine (i.e., wavelet-like) systems and Weyl-Heisenberg systems. This universality is afforded by a concurrent invariance property of deep networks to time-shifts, scalings, and frequency-shifts. In addition, deep networks provide exponential approximation accuracy—i.e., the approximation error decays exponentially in the number of non-zero weights in the network—of the multiplication operation, polynomials, sinusoidal functions, certain smooth functions, and even one-dimensional oscillatory textures and fractal functions such as the Weierstrass function, the latter two of which do not have previously known methods achieving exponential approximation accuracy. We also show that in the approximation of sufficiently smooth functions finite-width deep networks require strictly smaller connectivity than finite-depth wide networks.