[docs] Rework hyperparameter wordings and alternative to PPO jupyter notebook patches

6 年前 · 78906771
--- a/docs/Getting-Started-with-Balance-Ball.md
+++ b/docs/Getting-Started-with-Balance-Ball.md
 training process just learns what values are better given particular state 
 observations based on the rewards received when it tries different values). 
 For example, an element might represent a force or torque applied to a 
-Rigid body in the agent. The **Discrete** action vector space defines its
+`RigidBody` in the agent. The **Discrete** action vector space defines its
 actions as a table. A specific action given to the agent is an index into 
 this table. 



 To train the agents within the Ball Balance environment, we will be using the python 
-package. We have provided a convenient python wrapper script called `learn.py` which can be passed 
-arguments which are used to configure the training. 
+package. We have provided a convenient python wrapper script called `learn.py` which accepts arguments used to configure both training and inference phases.


 We will pass to this script the path of the environment executable that we just built. (Optionally) We can

 ```

-The `--train` flag tells ML Agents to run in training mode. `env_file_path` should be the path to the Unity executable 
-that was just created. 
+The `--train` flag tells ML-Agents to run in training mode. `env_file_path` should be the path to the Unity executable that was just created. 


 ### Observing Training Progress
-in more detail, you can use  TensorBoard. From the command line run :
+in more detail, you can use TensorBoard. From the command line run :

 `tensorboard --logdir=summaries`


 ### Embedding the trained model into Unity

-1. The trained model is stored in `models/<run-identifier` in the `ml-agents` folder. Once the 
+1. The trained model is stored in `models/<run-identifier>` in the `ml-agents` folder. Once the 
 training is complete, there will be a `<env_name>.bytes` file in that location where `<env_name>` is the name 
 of the executable used during training. 
 2. Move `<env_name>.bytes` from `python/models/ppo/` into 
--- a/docs/Training-PPO.md
+++ b/docs/Training-PPO.md

 `max_steps` corresponds to how many steps of the simulation (multiplied by frame-skip) are run during the training process. This value should be increased for more complex problems.

-Typical Range: `5e5 - 1e7`
+Typical Range: `5e5` - `1e7`
-#### Beta (Used only in Discrete Control)
+#### Beta
-`beta` corresponds to the strength of the entropy regularization, which makes the policy "more random." This ensures that discrete action space agents properly explore during training. Increasing this will ensure more random actions are taken. This should be adjusted such that the entropy (measurable from TensorBoard) slowly decreases alongside increases in reward. If entropy drops too quickly, increase `beta`. If entropy drops too slowly, decrease `beta`.
+`beta` corresponds to the strength of the entropy regularization, which makes the policy "more random." This ensures that agents properly explore the action space during training. Increasing this will ensure more random actions are taken. This should be adjusted such that the entropy (measurable from TensorBoard) slowly decreases alongside increases in reward. If entropy drops too quickly, increase `beta`. If entropy drops too slowly, decrease `beta`.

 Typical Range: `1e-4` - `1e-2`