浏览代码

[docs] Provide alternative to PPO jupyter notebook

/develop-generalizationTraining-TrainerController
eshvk 6 年前
当前提交
9082f15c
共有 1 个文件被更改,包括 31 次插入18 次删除
  1. 49
      docs/Getting-Started-with-Balance-Ball.md

49
docs/Getting-Started-with-Balance-Ball.md


![Balance Ball](images/balance.png)
This walkthrough uses the **3D Balance Ball** environment. 3D Balance Ball
contains a number of platforms and balls (which are all copies of each other).
This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball contains
a number of platforms and balls (which are all copies of each other).
Each platform tries to keep its ball from falling by rotating either
horizontally or vertically. In this environment, a platform is an **agent**
that receives a reward for every step that it balances the ball. An agent is

training process just learns what values are better given particular state
observations based on the rewards received when it tries different values).
For example, an element might represent a force or torque applied to a
Rigidbody in the agent. The **Discrete** action vector space defines its
Rigid body in the agent. The **Discrete** action vector space defines its
actions as a table. A specific action given to the agent is an index into
this table.

OpenAI has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/)
explaining it.
In order to train the agents within the Ball Balance environment:
To train the agents within the Ball Balance environment, we will be using the python
package. We have provided a convenient python wrapper script called `learn.py` which can be passed
arguments which are used to configure the training.
1. Open `python/PPO.ipynb` notebook from Jupyter.
2. Set `env_name` to the name of your environment file earlier.
3. (optional) In order to get the best results quickly, set `max_steps` to
50000, set `buffer_size` to 5000, and set `batch_size` to 512. For this
exercise, this will train the model in approximately ~5-10 minutes.
4. (optional) Set `run_path` directory to your choice. When using TensorBoard
to observe the training statistics, it helps to set this to a sequential value
We will pass to this script the path of the environment executable that we just built. (Optionally) We can
use `run_id` to identify the experiment and create a folder where the model and summary statistics are stored. When
using TensorBoard to observe the training statistics, it helps to set this to a sequential value
5. Run all cells of notebook with the exception of the last one under "Export
the trained Tensorflow graph."
To summarize, go to your command line, enter the `ml-agents` directory and type:
```
python python/learn.py <env_file_path> --run-id=<run-identifier> --train
```
The `--train` flag tells ML Agents to run in training mode. `env_file_path` should be the path to the Unity executable
that was just created.
In order to observe the training process in more detail, you can use
TensorBoard. In your command line, enter into `python` directory and then run :
Once you start training using `learn.py` in the way described in the previous section, the `ml-agents` folder will
contain a `summaries` directory. In order to observe the training process
in more detail, you can use TensorBoard. From the command line run :
`tensorboard --logdir=summaries`

### Embedding the trained model into Unity
1. Run the final cell of the notebook under "Export the trained TensorFlow
graph" to produce an `<env_name >.bytes` file.
2. Move `<env_name>.bytes` from `python/models/ppo/` into
1. The trained model is stored in `models/<run-identifier` in the `ml-agents` folder. Once the
training is complete, there will be a `<env_name>.bytes` file in that location where `<env_name>` is the name
of the executable used during training.
2. Move `<env_name>.bytes` from `python/models/ppo/` into
`unity-environment/Assets/ML-Agents/Examples/3DBall/TFModels/`.
3. Open the Unity Editor, and select the `3DBall` scene as described above.
4. Select the `Ball3DBrain` object from the Scene hierarchy.

正在加载...
取消
保存