浏览代码

[config] Disable `threading` by default (#5221)

* Remove threading as default

* New description

* Remove threaded option from YAML configs

* Remove from Match3
/check-for-ModelOverriders
GitHub 4 年前
当前提交
45e75e01
共有 41 个文件被更改,包括 5 次插入48 次删除
  1. 1
      config/imitation/Crawler.yaml
  2. 1
      config/imitation/Hallway.yaml
  3. 1
      config/imitation/PushBlock.yaml
  4. 1
      config/poca/DungeonEscape.yaml
  5. 1
      config/poca/PushBlockCollab.yaml
  6. 1
      config/poca/SoccerTwos.yaml
  7. 2
      config/poca/StrikersVsGoalie.yaml
  8. 1
      config/ppo/3DBall.yaml
  9. 1
      config/ppo/3DBallHard.yaml
  10. 1
      config/ppo/3DBall_randomize.yaml
  11. 1
      config/ppo/Basic.yaml
  12. 1
      config/ppo/Crawler.yaml
  13. 1
      config/ppo/FoodCollector.yaml
  14. 1
      config/ppo/GridWorld.yaml
  15. 1
      config/ppo/Hallway.yaml
  16. 3
      config/ppo/Match3.yaml
  17. 1
      config/ppo/PushBlock.yaml
  18. 1
      config/ppo/Pyramids.yaml
  19. 1
      config/ppo/PyramidsRND.yaml
  20. 3
      config/ppo/Sorter_curriculum.yaml
  21. 1
      config/ppo/Visual3DBall.yaml
  22. 1
      config/ppo/VisualFoodCollector.yaml
  23. 1
      config/ppo/Walker.yaml
  24. 2
      config/ppo/WallJump.yaml
  25. 2
      config/ppo/WallJump_curriculum.yaml
  26. 1
      config/ppo/Worm.yaml
  27. 1
      config/sac/3DBall.yaml
  28. 1
      config/sac/3DBallHard.yaml
  29. 1
      config/sac/Basic.yaml
  30. 1
      config/sac/Crawler.yaml
  31. 2
      config/sac/FoodCollector.yaml
  32. 1
      config/sac/GridWorld.yaml
  33. 1
      config/sac/Hallway.yaml
  34. 1
      config/sac/PushBlock.yaml
  35. 1
      config/sac/Pyramids.yaml
  36. 1
      config/sac/Walker.yaml
  37. 2
      config/sac/WallJump.yaml
  38. 1
      config/sac/Worm.yaml
  39. 2
      docs/Training-Configuration-File.md
  40. 2
      docs/Training-ML-Agents.md
  41. 2
      ml-agents/mlagents/trainers/settings.py

1
config/imitation/Crawler.yaml


max_steps: 10000000
time_horizon: 1000
summary_freq: 30000
threaded: true
behavioral_cloning:
demo_path: Project/Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawler.demo
steps: 50000

1
config/imitation/Hallway.yaml


max_steps: 10000000
time_horizon: 64
summary_freq: 10000
threaded: true

1
config/imitation/PushBlock.yaml


max_steps: 100000
time_horizon: 64
summary_freq: 60000
threaded: true
behavioral_cloning:
demo_path: Project/Assets/ML-Agents/Examples/PushBlock/Demos/ExpertPushBlock.demo
steps: 50000

1
config/poca/DungeonEscape.yaml


max_steps: 20000000
time_horizon: 64
summary_freq: 60000
threaded: true

1
config/poca/PushBlockCollab.yaml


max_steps: 15000000
time_horizon: 64
summary_freq: 60000
threaded: true

1
config/poca/SoccerTwos.yaml


max_steps: 50000000
time_horizon: 1000
summary_freq: 10000
threaded: false
self_play:
save_steps: 50000
team_change: 200000

2
config/poca/StrikersVsGoalie.yaml


max_steps: 30000000
time_horizon: 1000
summary_freq: 10000
threaded: false
self_play:
save_steps: 50000
team_change: 200000

max_steps: 30000000
time_horizon: 1000
summary_freq: 10000
threaded: false
self_play:
save_steps: 50000
team_change: 200000

1
config/ppo/3DBall.yaml


max_steps: 500000
time_horizon: 1000
summary_freq: 12000
threaded: true

1
config/ppo/3DBallHard.yaml


max_steps: 500000
time_horizon: 1000
summary_freq: 12000
threaded: true

1
config/ppo/3DBall_randomize.yaml


max_steps: 500000
time_horizon: 1000
summary_freq: 12000
threaded: true
environment_parameters:
mass:
sampler_type: uniform

1
config/ppo/Basic.yaml


max_steps: 500000
time_horizon: 3
summary_freq: 2000
threaded: true

1
config/ppo/Crawler.yaml


max_steps: 10000000
time_horizon: 1000
summary_freq: 30000
threaded: true

1
config/ppo/FoodCollector.yaml


max_steps: 2000000
time_horizon: 64
summary_freq: 10000
threaded: true

1
config/ppo/GridWorld.yaml


max_steps: 500000
time_horizon: 5
summary_freq: 20000
threaded: true

1
config/ppo/Hallway.yaml


max_steps: 10000000
time_horizon: 64
summary_freq: 10000
threaded: true

3
config/ppo/Match3.yaml


max_steps: 5000000
time_horizon: 128
summary_freq: 10000
threaded: true
behaviors:
Match3SimpleHeuristic:

num_layers: 1
max_steps: 5000000
summary_freq: 10000
threaded: true
Match3SmartHeuristic:
# Settings can be very simple since we don't care about actually training the model
trainer_type: ppo

num_layers: 1
max_steps: 5000000
summary_freq: 10000
threaded: true

1
config/ppo/PushBlock.yaml


max_steps: 2000000
time_horizon: 64
summary_freq: 60000
threaded: true

1
config/ppo/Pyramids.yaml


max_steps: 10000000
time_horizon: 128
summary_freq: 30000
threaded: true

1
config/ppo/PyramidsRND.yaml


max_steps: 3000000
time_horizon: 128
summary_freq: 30000
threaded: true

3
config/ppo/Sorter_curriculum.yaml


epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: constant
learning_rate_schedule: constant
network_settings:
normalize: False
hidden_units: 128

max_steps: 5000000
time_horizon: 256
summary_freq: 10000
threaded: true
environment_parameters:
num_tiles:
curriculum:

1
config/ppo/Visual3DBall.yaml


max_steps: 400000
time_horizon: 64
summary_freq: 20000
threaded: true

1
config/ppo/VisualFoodCollector.yaml


max_steps: 3000000
time_horizon: 100
summary_freq: 40000
threaded: true

1
config/ppo/Walker.yaml


max_steps: 30000000
time_horizon: 1000
summary_freq: 30000
threaded: true

2
config/ppo/WallJump.yaml


max_steps: 20000000
time_horizon: 128
summary_freq: 20000
threaded: true
SmallWallJump:
trainer_type: ppo
hyperparameters:

max_steps: 5000000
time_horizon: 128
summary_freq: 20000
threaded: true

2
config/ppo/WallJump_curriculum.yaml


max_steps: 20000000
time_horizon: 128
summary_freq: 20000
threaded: true
SmallWallJump:
trainer_type: ppo
hyperparameters:

max_steps: 5000000
time_horizon: 128
summary_freq: 20000
threaded: true
environment_parameters:
big_wall_height:
curriculum:

1
config/ppo/Worm.yaml


max_steps: 7000000
time_horizon: 1000
summary_freq: 30000
threaded: true

1
config/sac/3DBall.yaml


max_steps: 200000
time_horizon: 1000
summary_freq: 12000
threaded: true

1
config/sac/3DBallHard.yaml


max_steps: 500000
time_horizon: 1000
summary_freq: 12000
threaded: true

1
config/sac/Basic.yaml


max_steps: 500000
time_horizon: 10
summary_freq: 2000
threaded: true

1
config/sac/Crawler.yaml


max_steps: 5000000
time_horizon: 1000
summary_freq: 30000
threaded: true

2
config/sac/FoodCollector.yaml


max_steps: 2000000
time_horizon: 64
summary_freq: 60000
threaded: true
threaded: false

1
config/sac/GridWorld.yaml


max_steps: 500000
time_horizon: 5
summary_freq: 20000
threaded: true

1
config/sac/Hallway.yaml


max_steps: 5000000
time_horizon: 64
summary_freq: 10000
threaded: true

1
config/sac/PushBlock.yaml


max_steps: 2000000
time_horizon: 64
summary_freq: 100000
threaded: true

1
config/sac/Pyramids.yaml


max_steps: 3000000
time_horizon: 128
summary_freq: 30000
threaded: true

1
config/sac/Walker.yaml


max_steps: 15000000
time_horizon: 1000
summary_freq: 30000
threaded: true

2
config/sac/WallJump.yaml


max_steps: 15000000
time_horizon: 128
summary_freq: 20000
threaded: true
SmallWallJump:
trainer_type: sac
hyperparameters:

max_steps: 5000000
time_horizon: 128
summary_freq: 20000
threaded: true

1
config/sac/Worm.yaml


max_steps: 5000000
time_horizon: 1000
summary_freq: 30000
threaded: true

2
docs/Training-Configuration-File.md


| `keep_checkpoints` | (default = `5`) The maximum number of model checkpoints to keep. Checkpoints are saved after the number of steps specified by the checkpoint_interval option. Once the maximum number of checkpoints has been reached, the oldest checkpoint is deleted when saving a new checkpoint. |
| `checkpoint_interval` | (default = `500000`) The number of experiences collected between each checkpoint by the trainer. A maximum of `keep_checkpoints` checkpoints are saved before old ones are deleted. Each checkpoint saves the `.onnx` files in `results/` folder.|
| `init_path` | (default = None) Initialize trainer from a previously saved model. Note that the prior run should have used the same trainer configurations as the current run, and have been saved with the same version of ML-Agents. <br><br>You should provide the full path to the folder where the checkpoints were saved, e.g. `./models/{run-id}/{behavior_name}`. This option is provided in case you want to initialize different behaviors from different runs; in most cases, it is sufficient to use the `--initialize-from` CLI parameter to initialize all models from the same run. |
| `threaded` | (default = `true`) By default, model updates can happen while the environment is being stepped. This violates the [on-policy](https://spinningup.openai.com/en/latest/user/algorithms.html#the-on-policy-algorithms) assumption of PPO slightly in exchange for a training speedup. To maintain the strict on-policyness of PPO, you can disable parallel updates by setting `threaded` to `false`. There is usually no reason to turn `threaded` off for SAC. |
| `threaded` | (default = `false`) Allow environments to step while updating the model. This might result in a training speedup, especially when using SAC. For best performance, leave setting to `false` when using self-play. |
| `hyperparameters -> learning_rate` | (default = `3e-4`) Initial learning rate for gradient descent. Corresponds to the strength of each gradient descent update step. This should typically be decreased if training is unstable, and the reward does not consistently increase. <br><br>Typical range: `1e-5` - `1e-3` |
| `hyperparameters -> batch_size` | Number of experiences in each iteration of gradient descent. **This should always be multiple times smaller than `buffer_size`**. If you are using continuous actions, this value should be large (on the order of 1000s). If you are using only discrete actions, this value should be smaller (on the order of 10s). <br><br> Typical range: (Continuous - PPO): `512` - `5120`; (Continuous - SAC): `128` - `1024`; (Discrete, PPO & SAC): `32` - `512`. |
| `hyperparameters -> buffer_size` | (default = `10240` for PPO and `50000` for SAC)<br> **PPO:** Number of experiences to collect before updating the policy model. Corresponds to how many experiences should be collected before we do any learning or updating of the model. **This should be multiple times larger than `batch_size`**. Typically a larger `buffer_size` corresponds to more stable training updates. <br> **SAC:** The max size of the experience buffer - on the order of thousands of times longer than your episodes, so that SAC can learn from old as well as new experiences. <br><br>Typical range: PPO: `2048` - `409600`; SAC: `50000` - `1000000` |

2
docs/Training-ML-Agents.md


summary_freq: 10000
keep_checkpoints: 5
checkpoint_interval: 50000
threaded: true
threaded: false
init_path: null
# behavior cloning

2
ml-agents/mlagents/trainers/settings.py


max_steps: int = 500000
time_horizon: int = 64
summary_freq: int = 50000
threaded: bool = True
threaded: bool = False
self_play: Optional[SelfPlaySettings] = None
behavioral_cloning: Optional[BehavioralCloningSettings] = None

正在加载...
取消
保存