浏览代码

[Cold Fix] Making the episode length and mean reward more accurate for the first episode (#657)

/develop-generalizationTraining-TrainerController
GitHub 7 年前
当前提交
755be43e
共有 2 个文件被更改,包括 2 次插入2 次删除
  1. 3
      python/unitytrainers/bc/trainer.py
  2. 1
      python/unitytrainers/ppo/trainer.py

3
python/unitytrainers/bc/trainer.py


if stored_info_student is None:
continue
else:
idx = stored_info_student.agents.index(agent_id)
if not stored_info_student.local_done[idx]:
if not next_info_student.local_done[next_idx]:
if agent_id not in self.cumulative_rewards:
self.cumulative_rewards[agent_id] = 0
self.cumulative_rewards[agent_id] += next_info_student.rewards[next_idx]

1
python/unitytrainers/ppo/trainer.py


self.training_buffer[agent_id]['action_probs'].append(a_dist[idx])
self.training_buffer[agent_id]['value_estimates'].append(value[idx][0])
if not next_info.local_done[next_idx]:
if agent_id not in self.cumulative_rewards:
self.cumulative_rewards[agent_id] = 0
self.cumulative_rewards[agent_id] += next_info.rewards[next_idx]

正在加载...
取消
保存