Thursday Aug 03, 2023
ICLR 2023 - Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning
In this episode we discuss Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning by Anton Bakhtin, David J Wu, Adam Lerer, Jonathan Gray, Athul Paul Jacob, Gabriele Farina, Alexander H Miller, Noam Brown. The paper introduces a strategy called DiL-piKL that combines human imitation learning with reinforcement learning and planning to improve performance in the game of No-press Diplomacy. This algorithm regularizes a reward-maximizing policy towards a policy learned from human imitation, resulting in a no-regret learning algorithm. Building upon DiL-piKL, the paper proposes an extended self-play reinforcement learning algorithm called RL-DiL-piKL, which trains an agent that responds well to human play while also modeling human behavior.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.