浏览代码
At each step, an unused `last_reward` variable in the TF graph is updated in our PPO trainer. There are also related unused methods in various places in the codebase. This change removes them./develop-generalizationTraining-TrainerController
Jonathan Harper
6 年前
当前提交
177ee5b8
共有 7 个文件被更改,包括 9 次插入 和 62 次删除
正在加载...
Reference in new issue