浏览代码

Merge branch 'self-play-mutex' into soccer-2v1

/asymm-envs
Andrew Cohen 5 年前
当前提交
aa18bef6
共有 1 个文件被更改,包括 2 次插入2 次删除
  1. 4
      ml-agents/mlagents/trainers/ghost/trainer.py

4
ml-agents/mlagents/trainers/ghost/trainer.py


i.e. in asymmetric games. We assume the last reward determines the winner.
:param trajectory: Trajectory.
"""
if trajectory.done_reached and not trajectory.max_step_reached:
# Assumption is that final reward is 1/.5/0 for win/draw/loss
if trajectory.done_reached:
# Assumption is that final reward is >0/0/<0 for win/draw/loss
final_reward = trajectory.steps[-1].reward
result = 0.5
if final_reward > 0:

正在加载...
取消
保存