浏览代码
* Proper dimensions for entropy, sum before bonus in PPO * Make entropy reporting same as TF * Always use separate critic * Revert to shared * Remove unneeded extra line * Change entropy shape in test * Change another entropy shape * Add entropy summing to evaluate_actions * Add notes about torch.abs(policy_loss)/MLA-1734-demo-provider
GitHub
4 年前
当前提交
e0ef30a5