浏览代码

[docs] Rework hyperparameter wordings and alternative to PPO jupyter notebook patches

/develop-generalizationTraining-TrainerController
eshvk 6 年前
当前提交
78906771
共有 2 个文件被更改,包括 8 次插入10 次删除
  1. 12
      docs/Getting-Started-with-Balance-Ball.md
  2. 6
      docs/Training-PPO.md

12
docs/Getting-Started-with-Balance-Ball.md


training process just learns what values are better given particular state
observations based on the rewards received when it tries different values).
For example, an element might represent a force or torque applied to a
Rigid body in the agent. The **Discrete** action vector space defines its
`RigidBody` in the agent. The **Discrete** action vector space defines its
actions as a table. A specific action given to the agent is an index into
this table.

To train the agents within the Ball Balance environment, we will be using the python
package. We have provided a convenient python wrapper script called `learn.py` which can be passed
arguments which are used to configure the training.
package. We have provided a convenient python wrapper script called `learn.py` which accepts arguments used to configure both training and inference phases.
We will pass to this script the path of the environment executable that we just built. (Optionally) We can

```
The `--train` flag tells ML Agents to run in training mode. `env_file_path` should be the path to the Unity executable
that was just created.
The `--train` flag tells ML-Agents to run in training mode. `env_file_path` should be the path to the Unity executable that was just created.
### Observing Training Progress

in more detail, you can use TensorBoard. From the command line run :
in more detail, you can use TensorBoard. From the command line run :
`tensorboard --logdir=summaries`

### Embedding the trained model into Unity
1. The trained model is stored in `models/<run-identifier` in the `ml-agents` folder. Once the
1. The trained model is stored in `models/<run-identifier>` in the `ml-agents` folder. Once the
training is complete, there will be a `<env_name>.bytes` file in that location where `<env_name>` is the name
of the executable used during training.
2. Move `<env_name>.bytes` from `python/models/ppo/` into

6
docs/Training-PPO.md


`max_steps` corresponds to how many steps of the simulation (multiplied by frame-skip) are run during the training process. This value should be increased for more complex problems.
Typical Range: `5e5 - 1e7`
Typical Range: `5e5` - `1e7`
#### Beta (Used only in Discrete Control)
#### Beta
`beta` corresponds to the strength of the entropy regularization, which makes the policy "more random." This ensures that discrete action space agents properly explore during training. Increasing this will ensure more random actions are taken. This should be adjusted such that the entropy (measurable from TensorBoard) slowly decreases alongside increases in reward. If entropy drops too quickly, increase `beta`. If entropy drops too slowly, decrease `beta`.
`beta` corresponds to the strength of the entropy regularization, which makes the policy "more random." This ensures that agents properly explore the action space during training. Increasing this will ensure more random actions are taken. This should be adjusted such that the entropy (measurable from TensorBoard) slowly decreases alongside increases in reward. If entropy drops too quickly, increase `beta`. If entropy drops too slowly, decrease `beta`.
Typical Range: `1e-4` - `1e-2`

正在加载...
取消
保存