Zero-Shot (Human-AI) Coordination (in Hanabi) and Ridge Rider
In recent years we have seen fast progress on a number of zero-sum benchmark problems in AI, e.g. Go, Poker and Dota. In contrast, success in the real world requires humans to collaborate and communicate with others, in settings that are, at least partially, cooperative. Recently, the card game Hanabi has been established as a new benchmark environment to fill this gap. In particular, Hanabi is interesting to humans since it is entirely focused on theory of mind, i.e., the ability to reason over the intentions, beliefs and point of view of other agents when observing their actions. This is particularly important in applications such as communication, assistive technologies and autonomous driving.
We start out by introducing the zero-shot coordination setting as a new frontier for multi-agent research, which is partially addressed by Other-Play, a novel learning algorithm which biases learning towards more human compatible policies.
Lastly we introduce Ridge Rider, our brand new algorithm which addresses both zero-shot coordination and other optimization problems where the objective we care about can by definition not be evaluated during training time.
Bio: Jakob Foerster received a CIFAR AI chair in 2019 and is starting as an Assistant Professor at the University of Toronto and the Vector Institute in the academic year 20/21. During his PhD at the University of Oxford, he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind. He has since been working as a research scientist at Facebook AI Research in California, where he will continue advancing the field up to his move to Toronto. He was the lead organizer of the first Emergent Communication (EmeCom) workshop at NeurIPS in 2017, which he has helped organize ever since.