論文Abstract100本ノック#20

前回↓ ryosuke-okubo.hatenablog.com 96 PPO(2017) 原文: Proximal Policy Optimization Algorithms Abstract: We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" obj…