[docs] Provide alternative to PPO jupyter notebook

7 年前 · 9082f15c
--- a/docs/Getting-Started-with-Balance-Ball.md
+++ b/docs/Getting-Started-with-Balance-Ball.md

 ![Balance Ball](images/balance.png)

-This walkthrough uses the **3D Balance Ball** environment. 3D Balance Ball 
-contains a number of platforms and balls (which are all copies of each other). 
+This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball contains 
+a number of platforms and balls (which are all copies of each other). 
 Each platform tries to keep its ball from falling by rotating either 
 horizontally or vertically. In this environment, a platform is an **agent** 
 that receives a reward for every step that it balances the ball. An agent is 
 training process just learns what values are better given particular state 
 observations based on the rewards received when it tries different values). 
 For example, an element might represent a force or torque applied to a 
-Rigidbody in the agent. The **Discrete** action vector space defines its
+Rigid body in the agent. The **Discrete** action vector space defines its
 actions as a table. A specific action given to the agent is an index into 
 this table. 

 OpenAI has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/) 
 explaining it.

-In order to train the agents within the Ball Balance environment:
+
+To train the agents within the Ball Balance environment, we will be using the python 
+package. We have provided a convenient python wrapper script called `learn.py` which can be passed 
+arguments which are used to configure the training. 
-1. Open `python/PPO.ipynb` notebook from Jupyter.
-2. Set `env_name` to the name of your environment file earlier.
-3. (optional) In order to get the best results quickly, set `max_steps` to 
-50000, set `buffer_size` to 5000, and set `batch_size` to 512.  For this 
-exercise, this will train the model in approximately ~5-10 minutes.
-4. (optional) Set `run_path` directory to your choice. When using TensorBoard 
-to observe the training statistics, it helps to set this to a sequential value 
+
+We will pass to this script the path of the environment executable that we just built. (Optionally) We can
+use `run_id` to identify the experiment and create a folder where the model and summary statistics are stored. When 
+using TensorBoard to observe the training statistics, it helps to set this to a sequential value 
-5. Run all cells of notebook with the exception of the last one under "Export 
-the trained Tensorflow graph."
+
+To summarize, go to your command line, enter the `ml-agents` directory and type: 
+
+```
+
+python python/learn.py <env_file_path> --run-id=<run-identifier> --train 
+
+```
+
+The `--train` flag tells ML Agents to run in training mode. `env_file_path` should be the path to the Unity executable 
+that was just created. 
+
-In order to observe the training process in more detail, you can use 
-TensorBoard. In your command line, enter into `python` directory and then run :
+
+Once you start training using `learn.py` in the way described in the previous section, the `ml-agents` folder will 
+contain a `summaries` directory. In order to observe the training process 
+in more detail, you can use  TensorBoard. From the command line run :

 `tensorboard --logdir=summaries`


 ### Embedding the trained model into Unity

-1. Run the final cell of the notebook under "Export the trained TensorFlow
-graph" to produce an `<env_name >.bytes` file.
-2. Move `<env_name>.bytes` from `python/models/ppo/` into 
+1. The trained model is stored in `models/<run-identifier` in the `ml-agents` folder. Once the 
+training is complete, there will be a `<env_name>.bytes` file in that location where `<env_name>` is the name 
+of the executable used during training. 
+ 2. Move `<env_name>.bytes` from `python/models/ppo/` into 
 `unity-environment/Assets/ML-Agents/Examples/3DBall/TFModels/`.
 3. Open the Unity Editor, and select the `3DBall` scene as described above.
 4. Select the `Ball3DBrain` object from the Scene hierarchy.