Learning the learning rate: how to repair Bayes when the model is wrong
Bayesian inference can behave badly if the model under consideration is wrong yet useful: the posterior may fail to concentrate even for large samples, leading to extreme overfitting in practice. We demonstrate this on a simple regression problem. The problem goes away if we make the so-called learning rate small enough, which essentially amounts to making the prior more and the data less important. Standard Bayes sets the learning rate to 1, which can be far too high under model misspecification; in the exponential weights algorithm, a cousin of Bayes popular in the learning theory community, one often sets the learning rate to 1/sqrt{sample size}, which is too low if the setting is not adversarial. We introduce the safe Bayesian estimator, which learns the optimal learning rate from the data. Both in theory and practice, it behaves essentially as well as standard Bayes if the model is correct, but can get much better with wrong models. We give an intuitive explanation (even acceptable to Bayesians!) why a learning rate << 1 can sometimes be much better.