strikers 2v1 self play doc update

5 年前 · 7ccb3e0d
--- a/docs/Training-Self-Play.md
+++ b/docs/Training-Self-Play.md
 A symmetric game is one in which opposing agents are equal in form, function and objective. Examples of symmetric games
 are our Tennis and Soccer example environments. In reinforcement learning, this means both agents have the same observation and
 action spaces and learn from the same reward function and so *they can share the same policy*. In asymmetric games,
-this is not the case. An example of an asymmetric games are Hide and Seek. Agents in these
+this is not the case. An example of an asymmetric game is our Strikers Vs Goalie example environment. Agents in these
 types of games do not always have the same observation or action spaces and so sharing policy networks is not
 necessarily ideal.

 Note, in asymmetric games, the agents must have both different Behavior Names *and* different team IDs! Then, specify the trainer configuration
 for each Behavior Name in your scene as you would normally, and remember to include the self-play hyperparameter hierarchy!

-For examples of how to use this feature, you can see the trainer configurations and agent prefabs for our Tennis and Soccer environments.
-Tennis and Soccer provide examples of symmetric games. To train an asymmetric game, specify trainer configurations for each of your behavior names
-and include the self-play hyperparameter hierarchy in both.
+For examples of how to use this feature, you can see the trainer configurations and agent prefabs for our Tennis, Soccer, and
+Strikers Vs Goalie environments.
+Tennis and Soccer provide examples of symmetric games and Strikers Vs Goalie provides an example of an asymmetric game.


 ## Best Practices Training with Self-Play

 The `swap_steps` parameter corresponds to the number of *ghost steps* (not trainer steps) between swapping the opponents policy with a different snapshot.
 A 'ghost step' refers to a step taken by an agent *that is following a fixed policy and not learning*. The reason for this distinction is that in asymmetric games,
-we may have teams with an unequal number of agents e.g. a 2v1 scenario. The team with two agents collects
+we may have teams with an unequal number of agents e.g. a 2v1 scenario like our Strikers Vs Goalie example environment. The team with two agents collects
 twice as many agent steps per environment step as the team with one agent.  Thus, these two values will need to be distinct to ensure that the same number
 of trainer steps corresponds to the same number of opponent swaps for each team. The formula for `swap_steps` if
 a user desires `x` swaps of a team with `num_agents` agents against an opponent team with `num_opponent_agents`