renamed controller methods/doc fixes

5 年前 · c60d0c5a
--- a/docs/Training-Self-Play.md
+++ b/docs/Training-Self-Play.md

 ML-Agents provides the functionality to train both symmetric and asymmetric adversarial games with
 [Self-Play](https://openai.com/blog/competitive-self-play/).
-A symmetric game is one in which opposing agents are equal in form, function snd objective. Examples of symmetric games
+A symmetric game is one in which opposing agents are equal in form, function and objective. Examples of symmetric games
 are our Tennis and Soccer example environments. In reinforcement learning, this means both agents have the same observation and
 action spaces and learn from the same reward function and so *they can share the same policy*. In asymmetric games,
 this is not the case. Examples of asymmetric games are Hide and Seek or Strikers vs Goalie in Soccer. Agents in these

 ### Swap Steps

-The `swap_steps` parameter corresponds to the number of *ghost steps* (note, not trainer steps) between swapping the opponents policy with a different snapshot.
+The `swap_steps` parameter corresponds to the number of *ghost steps* (not trainer steps) between swapping the opponents policy with a different snapshot.
 A 'ghost step' refers to a step taken by an agent *that is following a fixed policy and not learning*. The reason for this distinction is that in asymmetric games,
 we may have teams with an unequal number of agents e.g. the 2v1 scenario in our Strikers Vs Goalie environment. The team with two agents collects
 twice as many agent steps per environment step as the team with one agent.  Thus, these two values will need to be distinct to ensure that the same number
--- a/ml-agents/mlagents/trainers/ghost/controller.py
+++ b/ml-agents/mlagents/trainers/ghost/controller.py
    GhostController contains a queue of team ids. GhostTrainers subscribe to the GhostController and query
    it to get the current learning team.  The GhostController cycles through team ids every 'swap_interval'
    which corresponds to the number of trainer steps between changing learning teams.
+    The GhostController is a unique object and there can only be one per training run.
    """

    def __init__(self, maxlen: int = 10):
        self._learning_team: int = -1
        # Dict from team id to GhostTrainer for ELO calculation
        self._ghost_trainers: Dict[int, GhostTrainer] = {}
+
+    @property
+    def get_learning_team(self) -> int:
+        """
+        Returns the current learning team.
+        :return: The learning team id
+        """
+        return self._learning_team

    def subscribe_team_id(self, team_id: int, trainer: GhostTrainer) -> None:
        """
            else:
                self._queue.append(team_id)

-    def get_learning_team(self) -> int:
-        """
-        Returns the current learning team.
-        :return: The learning team id
-        """
-        return self._learning_team
-
-    def finish_training(self, step: int) -> None:
+    def change_training_team(self, step: int) -> None:
        """
        The current learning team is added to the end of the queue and then updated with the
        next in line.
--- a/ml-agents/mlagents/trainers/ghost/trainer.py
+++ b/ml-agents/mlagents/trainers/ghost/trainer.py
        self.next_summary_step = self.trainer.next_summary_step
        self.trainer.advance()
        if self.get_step - self.last_team_change > self.steps_to_train_team:
-            self.controller.finish_training(self.get_step)
+            self.controller.change_training_team(self.get_step)
-        next_learning_team = self.controller.get_learning_team()
+        next_learning_team = self.controller.get_learning_team

        # CASE 1: Current learning team is managed by this GhostTrainer.
        # If the learning team changes, the following loop over queues will push the

            self._save_snapshot()  # Need to save after trainer initializes policy
            self.trainer.add_policy(parsed_behavior_id, policy)
-            self._learning_team = self.controller.get_learning_team()
+            self._learning_team = self.controller.get_learning_team
            self.wrapped_trainer_team = team_id

    def get_policy(self, name_behavior_id: str) -> TFPolicy: