浏览代码

removed team-change CLI

/develop/cubewars
Andrew Cohen 5 年前
当前提交
bc611906
共有 5 个文件被更改,包括 6 次插入15 次删除
  1. 2
      config/trainer_config.yaml
  2. 2
      docs/Training-Self-Play.md
  3. 4
      ml-agents/mlagents/trainers/ghost/trainer.py
  4. 10
      ml-agents/mlagents/trainers/learn.py
  5. 3
      ml-agents/mlagents/trainers/trainer_util.py

2
config/trainer_config.yaml


play_against_current_best_ratio: 0.5
save_steps: 50000
swap_steps: 50000
team_change: 100000
Soccer:
normalize: false

play_against_current_best_ratio: 0.5
save_steps: 50000
swap_steps: 50000
team_change: 100000
CrawlerStatic:
normalize: true

2
docs/Training-Self-Play.md


ML-Agents provides the functionality to train both symmetric and asymmetric adversarial games with
[Self-Play](https://openai.com/blog/competitive-self-play/).
A symmetric game is one in which opposing agents are equal in form, function snd objective. Examples of symmetric games
are Tennis and Soccer. In reinforcement learning, this means both agents have the same observation and
are our Tennis and Soccer example environments. In reinforcement learning, this means both agents have the same observation and
action spaces and learn from the same reward function and so *they can share the same policy*. In asymmetric games,
this is not the case. Examples of asymmetric games are Hide and Seek or Strikers vs Goalie in Soccer. Agents in these
types of games do not always have the same observation or action spaces and so sharing policy networks is not

4
ml-agents/mlagents/trainers/ghost/trainer.py


self.trainer.advance()
if self.get_step - self.last_team_change > self.steps_to_train_team:
self.controller.finish_training()
self.controller.finish_training(self.get_step)
self.last_team_change = self.get_step
next_learning_team = self.controller.get_learning_team()

] = policy.get_weights()
self._save_snapshot() # Need to save after trainer initializes policy
self.trainer.add_policy(parsed_behavior_id, policy)
self._learning_team = self.controller.get_learning_team(self.ghost_step)
self._learning_team = self.controller.get_learning_team()
self.wrapped_trainer_team = team_id
else:
# for saving/swapping snapshots

10
ml-agents/mlagents/trainers/learn.py


type=int,
help="Number of parallel environments to use for training",
)
argparser.add_argument(
"--team-change",
default=50000,
type=int,
help="Number of trainer steps between changing the team_id that is learning",
)
argparser.add_argument(
"--docker-target-name",
default=None,

keep_checkpoints: int = parser.get_default("keep_checkpoints")
base_port: int = parser.get_default("base_port")
num_envs: int = parser.get_default("num_envs")
team_change: int = parser.get_default("team_change")
curriculum_config: Optional[Dict] = None
lesson: int = parser.get_default("lesson")
no_graphics: bool = parser.get_default("no_graphics")

options.keep_checkpoints,
options.train_model,
options.load_model,
options.team_change,
run_seed,
maybe_meta_curriculum,
options.multi_gpu,

3
ml-agents/mlagents/trainers/trainer_util.py


keep_checkpoints: int,
train_model: bool,
load_model: bool,
team_change: int,
seed: int,
meta_curriculum: MetaCurriculum = None,
multi_gpu: bool = False,

self.seed = seed
self.meta_curriculum = meta_curriculum
self.multi_gpu = multi_gpu
self.ghost_controller = GhostController(team_change)
self.ghost_controller = GhostController()
def generate(self, brain_name: str) -> Trainer:
return initialize_trainer(

正在加载...
取消
保存