浏览代码

renamed controller methods/doc fixes

/develop/cubewars
Andrew Cohen 4 年前
当前提交
c60d0c5a
共有 3 个文件被更改,包括 15 次插入13 次删除
  1. 4
      docs/Training-Self-Play.md
  2. 18
      ml-agents/mlagents/trainers/ghost/controller.py
  3. 6
      ml-agents/mlagents/trainers/ghost/trainer.py

4
docs/Training-Self-Play.md


ML-Agents provides the functionality to train both symmetric and asymmetric adversarial games with
[Self-Play](https://openai.com/blog/competitive-self-play/).
A symmetric game is one in which opposing agents are equal in form, function snd objective. Examples of symmetric games
A symmetric game is one in which opposing agents are equal in form, function and objective. Examples of symmetric games
are our Tennis and Soccer example environments. In reinforcement learning, this means both agents have the same observation and
action spaces and learn from the same reward function and so *they can share the same policy*. In asymmetric games,
this is not the case. Examples of asymmetric games are Hide and Seek or Strikers vs Goalie in Soccer. Agents in these

### Swap Steps
The `swap_steps` parameter corresponds to the number of *ghost steps* (note, not trainer steps) between swapping the opponents policy with a different snapshot.
The `swap_steps` parameter corresponds to the number of *ghost steps* (not trainer steps) between swapping the opponents policy with a different snapshot.
A 'ghost step' refers to a step taken by an agent *that is following a fixed policy and not learning*. The reason for this distinction is that in asymmetric games,
we may have teams with an unequal number of agents e.g. the 2v1 scenario in our Strikers Vs Goalie environment. The team with two agents collects
twice as many agent steps per environment step as the team with one agent. Thus, these two values will need to be distinct to ensure that the same number

18
ml-agents/mlagents/trainers/ghost/controller.py


GhostController contains a queue of team ids. GhostTrainers subscribe to the GhostController and query
it to get the current learning team. The GhostController cycles through team ids every 'swap_interval'
which corresponds to the number of trainer steps between changing learning teams.
The GhostController is a unique object and there can only be one per training run.
"""
def __init__(self, maxlen: int = 10):

self._learning_team: int = -1
# Dict from team id to GhostTrainer for ELO calculation
self._ghost_trainers: Dict[int, GhostTrainer] = {}
@property
def get_learning_team(self) -> int:
"""
Returns the current learning team.
:return: The learning team id
"""
return self._learning_team
def subscribe_team_id(self, team_id: int, trainer: GhostTrainer) -> None:
"""

else:
self._queue.append(team_id)
def get_learning_team(self) -> int:
"""
Returns the current learning team.
:return: The learning team id
"""
return self._learning_team
def finish_training(self, step: int) -> None:
def change_training_team(self, step: int) -> None:
"""
The current learning team is added to the end of the queue and then updated with the
next in line.

6
ml-agents/mlagents/trainers/ghost/trainer.py


self.next_summary_step = self.trainer.next_summary_step
self.trainer.advance()
if self.get_step - self.last_team_change > self.steps_to_train_team:
self.controller.finish_training(self.get_step)
self.controller.change_training_team(self.get_step)
next_learning_team = self.controller.get_learning_team()
next_learning_team = self.controller.get_learning_team
# CASE 1: Current learning team is managed by this GhostTrainer.
# If the learning team changes, the following loop over queues will push the

self._save_snapshot() # Need to save after trainer initializes policy
self.trainer.add_policy(parsed_behavior_id, policy)
self._learning_team = self.controller.get_learning_team()
self._learning_team = self.controller.get_learning_team
self.wrapped_trainer_team = team_id
def get_policy(self, name_behavior_id: str) -> TFPolicy:

正在加载...
取消
保存