Relaxed Gradient Estimators
In modern machine learning there is a demand for reliable gradient estimators. When the random variable we seek to control is differentially reparametrizable, low variance gradient estimators are easy to derive that enable training large scale stochastic computation graphs. When this is not the case, for example when the random variable is discrete, most practitioners resort to high variance estimators.
In this talk we cover a series of three papers, each building on the previous, that address this problem for discrete random variables. The first introduces biased relaxations of gradient estimators (Maddison et al., 2017; Jang et al., 2017). The second, a debiasing scheme for those relaxed estimators (Tucker et al., 2017). The third, a generalization of the debiasing scheme that allows the functional form of the relaxation to be optimized for low variance (Grathwohl et al., 2017). Taken together, these papers represent the current state of the art. We conclude the talk by discussing the barriers that remain to making this framework generically useful for the estimation of gradients in stochastic computation graphs.
Bio:
Chris Maddison is a DPhil student in the Statistical Machine Learning Group in the Department of Statistics at the University of Oxford supervised by Yee Whye Teh and Arnaud Doucet. He also spends two days a week as a Research Scientist at DeepMind. Previously, he received his MSc. from the University of Toronto supervised by Geoffrey Hinton. He was one of the primary contributors to the AlphaGo project, and his research interests include probabilistic inference, Monte Carlo methods, and neural networks.