浏览代码
Improvements for GAIL (#2296)
Improvements for GAIL (#2296)
* Don't 0 value bootstrap for GAIL and Curiosity * Add gradient penalties to GAN to help with stability * Add gail_config.yaml with GAIL examples * Cleaned up trainer_config.yaml and unnecessary gammas * Documentation updates * Code cleanup/develop-generalizationTraining-TrainerController
GitHub
6 年前
当前提交
6a212f73
共有 10 个文件被更改,包括 235 次插入 和 59 次删除
-
35config/trainer_config.yaml
-
62docs/Training-Imitation-Learning.md
-
6ml-agents/mlagents/trainers/components/bc/model.py
-
1ml-agents/mlagents/trainers/components/reward_signals/curiosity/signal.py
-
61ml-agents/mlagents/trainers/components/reward_signals/gail/model.py
-
1ml-agents/mlagents/trainers/components/reward_signals/gail/signal.py
-
3ml-agents/mlagents/trainers/components/reward_signals/reward_signal.py
-
12ml-agents/mlagents/trainers/ppo/policy.py
-
7ml-agents/mlagents/trainers/tests/test_ppo.py
-
106config/gail_config.yaml
|
|||
default: |
|||
trainer: ppo |
|||
batch_size: 1024 |
|||
beta: 5.0e-3 |
|||
buffer_size: 10240 |
|||
epsilon: 0.2 |
|||
hidden_units: 128 |
|||
lambd: 0.95 |
|||
learning_rate: 3.0e-4 |
|||
max_steps: 5.0e4 |
|||
memory_size: 256 |
|||
normalize: false |
|||
num_epoch: 3 |
|||
num_layers: 2 |
|||
time_horizon: 64 |
|||
sequence_length: 64 |
|||
summary_freq: 1000 |
|||
use_recurrent: false |
|||
reward_signals: |
|||
extrinsic: |
|||
strength: 1.0 |
|||
gamma: 0.99 |
|||
|
|||
PyramidsLearning: |
|||
summary_freq: 2000 |
|||
time_horizon: 128 |
|||
batch_size: 128 |
|||
buffer_size: 2048 |
|||
hidden_units: 512 |
|||
num_layers: 2 |
|||
beta: 1.0e-2 |
|||
max_steps: 5.0e5 |
|||
num_epoch: 3 |
|||
pretraining: |
|||
demo_path: ./demos/ExpertPyramid.demo |
|||
strength: 0.5 |
|||
steps: 10000 |
|||
reward_signals: |
|||
extrinsic: |
|||
strength: 1.0 |
|||
gamma: 0.99 |
|||
curiosity: |
|||
strength: 0.02 |
|||
gamma: 0.99 |
|||
encoding_size: 256 |
|||
gail: |
|||
strength: 0.01 |
|||
gamma: 0.99 |
|||
encoding_size: 128 |
|||
demo_path: demos/ExpertPyramid.demo |
|||
|
|||
CrawlerStaticLearning: |
|||
normalize: true |
|||
num_epoch: 3 |
|||
time_horizon: 1000 |
|||
batch_size: 2024 |
|||
buffer_size: 20240 |
|||
max_steps: 1e6 |
|||
summary_freq: 3000 |
|||
num_layers: 3 |
|||
hidden_units: 512 |
|||
reward_signals: |
|||
gail: |
|||
strength: 1.0 |
|||
gamma: 0.99 |
|||
encoding_size: 128 |
|||
demo_path: demos/ExpertCrawlerSta.demo |
|||
|
|||
PushBlockLearning: |
|||
max_steps: 5.0e4 |
|||
batch_size: 128 |
|||
buffer_size: 2048 |
|||
beta: 1.0e-2 |
|||
hidden_units: 256 |
|||
summary_freq: 2000 |
|||
time_horizon: 64 |
|||
num_layers: 2 |
|||
reward_signals: |
|||
gail: |
|||
strength: 1.0 |
|||
gamma: 0.99 |
|||
encoding_size: 128 |
|||
demo_path: demos/ExpertPush.demo |
|||
|
|||
HallwayLearning: |
|||
use_recurrent: true |
|||
sequence_length: 64 |
|||
num_layers: 2 |
|||
hidden_units: 128 |
|||
memory_size: 256 |
|||
beta: 1.0e-2 |
|||
num_epoch: 3 |
|||
buffer_size: 1024 |
|||
batch_size: 128 |
|||
max_steps: 5.0e5 |
|||
summary_freq: 1000 |
|||
time_horizon: 64 |
|||
reward_signals: |
|||
extrinsic: |
|||
strength: 1.0 |
|||
gamma: 0.99 |
|||
gail: |
|||
strength: 0.1 |
|||
gamma: 0.99 |
|||
encoding_size: 128 |
|||
demo_path: demos/ExpertHallway.demo |
撰写
预览
正在加载...
取消
保存
Reference in new issue