浏览代码

doc fix

/develop/cubewars
Andrew Cohen 5 年前
当前提交
947cffb8
共有 1 个文件被更改,包括 2 次插入3 次删除
  1. 5
      docs/Training-Self-Play.md

5
docs/Training-Self-Play.md


Recommended Range : 10000-100000
### Play against current best ratio
### Play against latest model ratio
an agent will play against the current opponent. With probability
an agent will play against the latest opponent policy. With probability
1 - `play_against_latest_model_ratio`, the agent will play against a snapshot of its
opponent from a past iteration.

Note, this implementation will support any number of teams but ELO is only applicable to games with two teams. It is ongoing work to implement
a reliable metric for measuring progress in these scenarios. These scenarios can still train, though as of now, reward and qualitative observations
are the only metric by which we can judge performance.
正在加载...
取消
保存