浏览代码

Clear cumulative_returns_since_policy_update (#2120)

Before the CSV file's mean rewards would lag much behind the rest of the code since this buffer was never cleared.
/develop-generalizationTraining-TrainerController
Vincent(Yuan) Gao 5 年前
当前提交
a15763f8
共有 1 个文件被更改,包括 1 次插入0 次删除
  1. 1
      ml-agents/mlagents/trainers/ppo/trainer.py

1
ml-agents/mlagents/trainers/ppo/trainer.py


number_experiences=len(self.training_buffer.update_buffer["actions"]),
mean_return=float(np.mean(self.cumulative_returns_since_policy_update)),
)
self.cumulative_returns_since_policy_update = []
n_sequences = max(
int(self.trainer_parameters["batch_size"] / self.policy.sequence_length), 1
)

正在加载...
取消
保存