DisTraL: Distill and Transfer for Deep Multitask Reinforcement Learning
Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network parameters, where efficiency may be improved through transfer across related tasks. In practice, however, this is not usually observed, because gradients from different tasks can interfere negatively, making learning unstable and sometimes even less data efficient. Another
issue is the different reward schemes between tasks, which can easily lead to one task dominating the learning of a shared model.
We propose a new approach for joint training of multiple tasks, which we refer to as DisTraL (Distill & Transfer Learning). Instead of sharing parameters between the different workers, we propose to share a "distilled" policy that captures common behaviour across tasks. Each worker is trained to solve its own task while constrained to stay close to the shared policy, while the shared policy is trained by distillation to be the centroid of all task policies. Both aspects of the learning process are derived by optimizing a joint objective function. We show that our approach supports efficient transfer on complex 3D environments, outperforming several related methods. Moreover, the proposed learning process is more robust and more stable – attributes that are critical in deep reinforcement learning.
--
Yee Whye Teh is an RSIV Professor of Statistical Machine Learning at the University of Oxford. He is an ATI Faculty Fellow and also spends time at Google DeepMind working on AI research. He is currently a European Research Council Consolidator Fellow, and was a Tutorial Fellow at University College. He obtained his PhD at the University of Toronto under Professor Geoffrey E. Hinton, and did postdoctoral work at the University of California at Berkeley under Professor Michael I. Jordan and at the National University of Singapore. He was a Lecturer and then a Reader at the Gatsby Computational Neuroscience Unit, UCL from January 2007 to August 2012.
His research interests are in machine learning and computational statistics, particularly probabilistic methods, Bayesian nonparametrics and deep learning. He develops novel models as well as efficient algorithms for inference and learning.
He was programme co-chair with Michael Titterington of the International Conference on Artificial Intelligence and Statistics (AISTATS) 2010, programme co-chair with Doina Precup of the International Conference on Machine Learning (ICML) 2017, and is/has been an associate editor for Bayesian Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, Machine Learning Journal, Statistical Sciences, Journal of the Royal Statistical Society Series B and Journal of Machine Learning Research. He has been area chair for NIPS, ICML and AISTATS on multiple occasions.