浏览代码

Develop fix cumulative reward (#725)

* [Cold Fix] Split the way cummulative rewards and episode length are counted
The reward is appended at each step to the cummulative reward
The episode count is ONLY incremented when d_t+1 is false
/develop-generalizationTraining-TrainerController
Arthur Juliani 6 年前
当前提交
9477eaa9
共有 2 个文件被更改,包括 4 次插入5 次删除
  1. 6
      python/unitytrainers/bc/trainer.py
  2. 3
      python/unitytrainers/ppo/trainer.py

6
python/unitytrainers/bc/trainer.py


continue
else:
next_idx = next_info_student.agents.index(agent_id)
if agent_id not in self.cumulative_rewards:
self.cumulative_rewards[agent_id] = 0
self.cumulative_rewards[agent_id] += next_info_student.rewards[next_idx]
if agent_id not in self.cumulative_rewards:
self.cumulative_rewards[agent_id] = 0
self.cumulative_rewards[agent_id] += next_info_student.rewards[next_idx]
if agent_id not in self.episode_steps:
self.episode_steps[agent_id] = 0
self.episode_steps[agent_id] += 1

3
python/unitytrainers/ppo/trainer.py


self.training_buffer[agent_id]['rewards'].append(next_info.rewards[next_idx])
self.training_buffer[agent_id]['action_probs'].append(a_dist[idx])
self.training_buffer[agent_id]['value_estimates'].append(value[idx][0])
if not next_info.local_done[next_idx]:
if not next_info.local_done[next_idx]:
if agent_id not in self.episode_steps:
self.episode_steps[agent_id] = 0
self.episode_steps[agent_id] += 1

正在加载...
取消
保存