![Balance Ball](images/balance.png) |
This walkthrough uses the **3D Balance Ball** environment. 3D Balance Ball |
contains a number of platforms and balls (which are all copies of each other). |
This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball contains |
a number of platforms and balls (which are all copies of each other). |
Each platform tries to keep its ball from falling by rotating either |
horizontally or vertically. In this environment, a platform is an **agent** |
that receives a reward for every step that it balances the ball. An agent is |
training process just learns what values are better given particular state |
observations based on the rewards received when it tries different values). |
For example, an element might represent a force or torque applied to a |
Rigidbody in the agent. The **Discrete** action vector space defines its |
Rigid body in the agent. The **Discrete** action vector space defines its |
actions as a table. A specific action given to the agent is an index into |
this table. |
OpenAI has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/) |
explaining it. |
In order to train the agents within the Ball Balance environment: |
To train the agents within the Ball Balance environment, we will be using the python |
package. We have provided a convenient python wrapper script called `learn.py` which can be passed |
arguments which are used to configure the training. |
1. Open `python/PPO.ipynb` notebook from Jupyter. |
2. Set `env_name` to the name of your environment file earlier. |
3. (optional) In order to get the best results quickly, set `max_steps` to |
50000, set `buffer_size` to 5000, and set `batch_size` to 512. For this |
exercise, this will train the model in approximately ~5-10 minutes. |
4. (optional) Set `run_path` directory to your choice. When using TensorBoard |
to observe the training statistics, it helps to set this to a sequential value |
We will pass to this script the path of the environment executable that we just built. (Optionally) We can |
use `run_id` to identify the experiment and create a folder where the model and summary statistics are stored. When |
using TensorBoard to observe the training statistics, it helps to set this to a sequential value |
5. Run all cells of notebook with the exception of the last one under "Export |
the trained Tensorflow graph." |
To summarize, go to your command line, enter the `ml-agents` directory and type: |
``` |
python python/learn.py <env_file_path> --run-id=<run-identifier> --train |
``` |
The `--train` flag tells ML Agents to run in training mode. `env_file_path` should be the path to the Unity executable |
that was just created. |
In order to observe the training process in more detail, you can use |
TensorBoard. In your command line, enter into `python` directory and then run : |
Once you start training using `learn.py` in the way described in the previous section, the `ml-agents` folder will |
contain a `summaries` directory. In order to observe the training process |
in more detail, you can use TensorBoard. From the command line run : |
`tensorboard --logdir=summaries` |
### Embedding the trained model into Unity |
1. Run the final cell of the notebook under "Export the trained TensorFlow |
graph" to produce an `<env_name >.bytes` file. |
2. Move `<env_name>.bytes` from `python/models/ppo/` into |
1. The trained model is stored in `models/<run-identifier` in the `ml-agents` folder. Once the |
training is complete, there will be a `<env_name>.bytes` file in that location where `<env_name>` is the name |
of the executable used during training. |
2. Move `<env_name>.bytes` from `python/models/ppo/` into |
`unity-environment/Assets/ML-Agents/Examples/3DBall/TFModels/`. |
3. Open the Unity Editor, and select the `3DBall` scene as described above. |
4. Select the `Ball3DBrain` object from the Scene hierarchy. |