Thursday Aug 03, 2023

ICLR 2023 - Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

In this episode we discuss Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning by Anton Bakhtin, David J Wu, Adam Lerer, Jonathan Gray, Athul Paul Jacob, Gabriele Farina, Alexander H Miller, Noam Brown. The paper introduces a strategy called DiL-piKL that combines human imitation learning with reinforcement learning and planning to improve performance in the game of No-press Diplomacy. This algorithm regularizes a reward-maximizing policy towards a policy learned from human imitation, resulting in a no-regret learning algorithm. Building upon DiL-piKL, the paper proposes an extended self-play reinforcement learning algorithm called RL-DiL-piKL, which trains an agent that responds well to human play while also modeling human behavior.

Comment (0)

No comments yet. Be the first to say something!