浏览代码

fix the training doc (#1193)

/develop-generalizationTraining-TrainerController
GitHub 6 年前
当前提交
4a881354
共有 2 个文件被更改,包括 9 次插入7 次删除
  1. 8
      docs/Training-ML-Agents.md
  2. 8
      ml-agents/mlagents/trainers/bc/trainer.py

8
docs/Training-ML-Agents.md


| :-- | :-- | :-- |
| batch_size | The number of experiences in each iteration of gradient descent.| PPO, BC |
| batches_per_epoch | In imitation learning, the number of batches of training examples to collect before training the model.| BC |
| beta | The strength of entropy regularization.| PPO, BC |
| beta | The strength of entropy regularization.| PPO |
| epsilon | Influences how rapidly the policy can evolve during training.| PPO, BC |
| epsilon | Influences how rapidly the policy can evolve during training.| PPO |
| gamma | The reward discount rate for the Generalized Advantage Estimator (GAE). | PPO |
| hidden_units | The number of units in the hidden layers of the neural network. | PPO, BC |
| lambd | The regularization parameter. | PPO |

| normalize | Whether to automatically normalize observations. | PPO, BC |
| num_epoch | The number of passes to make through the experience buffer when performing gradient descent optimization. | PPO, BC |
| normalize | Whether to automatically normalize observations. | PPO |
| num_epoch | The number of passes to make through the experience buffer when performing gradient descent optimization. | PPO |
| num_layers | The number of hidden layers in the neural network. | PPO, BC |
| sequence_length | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
| summary_freq | How often, in steps, to save training statistics. This determines the number of data points shown by TensorBoard. | PPO, BC |

8
ml-agents/mlagents/trainers/bc/trainer.py


"""
super(BehavioralCloningTrainer, self).__init__(sess, brain, trainer_parameters, training, run_id)
self.param_keys = ['brain_to_imitate', 'batch_size', 'time_horizon', 'graph_scope',
'summary_freq', 'max_steps', 'batches_per_epoch', 'use_recurrent', 'hidden_units',
'num_layers', 'sequence_length', 'memory_size']
self.param_keys = ['brain_to_imitate', 'batch_size', 'time_horizon',
'graph_scope', 'summary_freq', 'max_steps',
'batches_per_epoch', 'use_recurrent',
'hidden_units','learning_rate', 'num_layers',
'sequence_length', 'memory_size']
for k in self.param_keys:
if k not in trainer_parameters:

正在加载...
取消
保存