Unifying variational formulation of supervised learning: From kernel methods to neural networks
A powerful framework for supervised learning is the minimization of a cost that consists of a data fidelity term plus a regularization functional. In this talk, I introduce a unifying regularization functional that depends on a generic operator L and a Radon-domain p-norm. When the norm is Euclidean (p=2), the proposed formulation yields a solution that involves radial basis functions and is compatible with the classical methods of machine learning. By contrast, for p=1 (total variation norm), the solution takes the form of a two layer neural network with an activation function that is determined by the regularization operator. In particular, one retrieves the popular ReLU networks by taking L to be the Laplacian. The proposed setting offers guarantees of universal approximation for a broad family of regularization operators or, equivalently, for a wide variety of shallow neural networks including cases (such as ReLU) where the activation function is increasing polynomially. It also explains the favorable role of bias and skip connections in neural architectures.