|
|
|
|
|
|
|
|
|
|
Recommended Range : 10000-100000 |
|
|
|
|
|
|
|
### Play against current best ratio |
|
|
|
### Play against latest model ratio |
|
|
|
an agent will play against the current opponent. With probability |
|
|
|
an agent will play against the latest opponent policy. With probability |
|
|
|
1 - `play_against_latest_model_ratio`, the agent will play against a snapshot of its |
|
|
|
opponent from a past iteration. |
|
|
|
|
|
|
|
|
|
|
Note, this implementation will support any number of teams but ELO is only applicable to games with two teams. It is ongoing work to implement |
|
|
|
a reliable metric for measuring progress in these scenarios. These scenarios can still train, though as of now, reward and qualitative observations |
|
|
|
are the only metric by which we can judge performance. |
|
|
|
|