Thursday Aug 03, 2023

ICLR 2023 - Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

In this episode we discuss Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning by Anton Bakhtin, David J Wu, Adam Lerer, Jonathan Gray, Athul Paul Jacob, Gabriele Farina, Alexander H Miller, Noam Brown. The paper introduces a strategy called DiL-piKL that combines human imitation learning with reinforcement learning and planning to improve performance in the game of No-press Diplomacy. This algorithm regularizes a reward-maximizing policy towards a policy learned from human imitation, resulting in a no-regret learning algorithm. Building upon DiL-piKL, the paper proposes an extended self-play reinforcement learning algorithm called RL-DiL-piKL, which trains an agent that responds well to human play while also modeling human behavior.

Comments (0)

To leave or reply to comments, please download free Podbean or

No Comments

Copyright 2023 All rights reserved.

Podcast Powered By Podbean

Version: 20241125