Fit without Fear: an Interpolation Perspective on Generalization and Optimization in Modern Machine Learning
A striking feature of modern supervised machine learning is its consistent use of techniques that interpolate the data. Deep networks, often containing several orders of magnitude more parameters than data points, are trained to obtain near zero error on the training set. Yet, at odds with most theory, they show excellent test performance.
In this talk I will discuss and give some historical context for the phenomenon of interpolation (zero training loss). I will show how it provides a new perspective on machine learning forcing us to rethink some commonly held assumptions and points to significant gaps in our understanding, even in the simplest settings, of when classifiers generalize. I will outline some first theoretical results in that direction, showing that such classifiers can indeed be statistically consistent and even optimal.
In the second part of the talk I will point to the computational power of interpolation by describing how it results in very efficient optimization of over-parametrized models using Stochastic Gradient Descent. Furthermore, I will show how the simplicity of the setting can be harnessed to construct very fast and theoretically sound methods for training large-scale kernel machines. I will also briefly describe some new accelerated SGD methods for over-parametrized settings.