Margins and Neural Collapse in Deep Learning
Much effort in recent years has been put in establishing theoretical guarantees of margin maximization in deep neural networks, extending classical results from Support Vector Machines. A natural question remains however: how many of the data points act as support? Neural Collapse is a phenomenon recently discovered in deep classifiers where the last layer activations collapse onto their class means, while the means and last layer weights take on the structure of dual equiangular tight frames, suggesting that all the data in the training set are "support vectors". In this talk I will discuss the role of weight decay in the emergence of Neural Collapse in deep homogeneous networks. I will show that certain near-interpolating minima of deep networks satisfy the Neural Collapse condition, and this can be derived from the gradient flow on the regularized square loss.