浏览代码
In the previous PR, steps were processed when the env manager was reset. This was an issue for the very first reset, where we don't actually know which agent groups (and AgentManagers) we needed to send the steps to. These steps were being thrown away. This PR moves the processing of steps to advance(), so that the initial reset steps are simply processed when the next advance(). This also removes the need for an additional block of code in TrainerController to handle the initial reset./asymm-envs
GitHub
5 年前
当前提交
65dbe0ec