doc fix

5 年前 · 947cffb8
--- a/docs/Training-Self-Play.md
+++ b/docs/Training-Self-Play.md

 Recommended Range : 10000-100000

-### Play against current best ratio
+### Play against latest model ratio
-an agent will play against the current opponent. With probability
+an agent will play against the latest opponent policy. With probability
 1 - `play_against_latest_model_ratio`, the agent will play against a snapshot of its
 opponent from a past iteration.

 Note, this implementation will support any number of teams but ELO is only applicable to games with two teams.  It is ongoing work to implement
 a reliable metric for measuring progress in these scenarios. These scenarios can still train, though as of now, reward and qualitative observations
 are the only metric by which we can judge performance.
-