Edits for 0.3

7 年前 · ac5e6bc7
--- a/docs/Feature-Broadcasting.md
+++ b/docs/Feature-Broadcasting.md
 # Using the Broadcast Feature
+
+
 ## How to use : Unity
 To turn it on in Unity, simply check the `Broadcast` box as shown bellow:

+
+
+
 Note that when you do a `step` on the environment, you cannot provide actions for non-external brains. If there are no external brains in the scene, simply call `step()` with no arguments.  
 You can use the broadcast feature to collect data generated by Player, Heuristics or Internal brains game sessions. You can then use this data to train an agent in a supervised context.
--- a/docs/Feature-Memory.md
+++ b/docs/Feature-Memory.md
-# Using Recurrent Neural Network in ML-Agents
+# Using Recurrent Neural Networks in ML-Agents

 ## What are memories for?
 Have you ever entered a room to get something and immediately forgot
 	memory_size: 256
 ```

-* `use_recurent` is a flag that notifies the  trainer that you want 
+* `use_recurrent` is a flag that notifies the  trainer that you want 
 to use a Recurrent Neural Network.
 * `sequence_length` defines how long the sequences of experiences 
 must be while training. In order to use a LSTM, training requires 
--- a/docs/Learning-Environment-Design-Brains.md
+++ b/docs/Learning-Environment-Design-Brains.md

 Use the Brain class directly, rather than a subclass. Brain behavior is determined by the **Brain Type**. ML-Agents defines four Brain Types:

-* [External](Learning-Environment-External-Brains.md) — The **External** and **Internal** types typically work together; set **External** when training your agents. You can also use the **External** brain to communicate with a Python script via the Python `UnityEnvironment` class included in the Python portion of the ML-Agents SDK.
-* [Internal](Learning-Environment-Internal-Brains.md) – Set **Internal**  to make use of a trained model.
+* [External](Learning-Environment-External-Internal-Brains.md) — The **External** and **Internal** types typically work together; set **External** when training your agents. You can also use the **External** brain to communicate with a Python script via the Python `UnityEnvironment` class included in the Python portion of the ML-Agents SDK.
+* [Internal](Learning-Environment-External-Internal-Brains.md) – Set **Internal**  to make use of a trained model.
 * [Heuristic](Learning-Environment-Heuristic-Brains.md) – Set **Heuristic** to hand-code the agent's logic by extending the Decision class.
 * [Player](Learning-Environment-Player-Brains.md) – Set **Player** to map keyboard keys to agent actions, which can be useful to test your agent code.

 * `Type of Brain` - Describes how the Brain will decide actions.
    * `External` - Actions are decided by an external process, such as the PPO training process.
    * `Internal` - Actions are decided using internal TensorFlowSharp model.
-    * `Player` - Actions are decided using Player input mappings.
-    * `Heuristic` - Actions are decided using custom `Decision` script, which must be attached to the Brain game object.
+    * `Player` - Actions are decided using keyboard input mappings.
+    * `Heuristic` - Actions are decided using a custom `Decision` script, which must be attached to the Brain game object.
-### Internal Brain
-
-![Internal Brain Inspector](images/internal_brain.png)
-
-   *  `Graph Model` : This must be the `bytes` file corresponding to the pretrained Tensorflow graph. (You must first drag this file into your Resources folder and then from the Resources folder into the inspector)
-   *  `Graph Scope` : If you set a scope while training your TensorFlow model, all your placeholder name will have a prefix. You must specify that prefix here.
-   *  `Batch Size Node Name` : If the batch size is one of the inputs of your graph, you must specify the name if the placeholder here. The brain will make the batch size equal to the number of agents connected to the brain automatically.
-   *  `Vector Observation Node Name` : If your graph uses a vector observation as an input, you must specify the name if the placeholder here.
-   *  `Recurrent Input Node Name` : If your graph uses a recurrent input / memory as input and outputs new recurrent input / memory, you must specify the name if the input placeholder here.
-   *  `Recurrent Output Node Name` : If your graph uses a recurrent input / memory as input and outputs new recurrent input / memory, you must specify the name if the output placeholder here.
-   * `Visual Observation Placeholder Name` : If your graph uses observations as input, you must specify it here. Note that the number of observations is equal to the length of `Camera Resolutions` in the brain parameters.
-   * `Action Node Name` : Specify the name of the placeholder corresponding to the actions of the brain in your graph. If the action space type is continuous, the output must be a one dimensional tensor of float of length `Action Space Size`, if the action space type is discrete, the output must be a one dimensional tensor of int of length 1.
-   * `Graph Placeholder` : If your graph takes additional inputs that are fixed (example: noise level) you can specify them here. Note that in your graph, these must correspond to one dimensional tensors of int or float of size 1.
-     * `Name` : Corresponds to the name of the placeholdder.
-     * `Value Type` : Either Integer or Floating Point.
-     * `Min Value` and `Max Value` : Specify the range of the value here. The value will be sampled from the uniform distribution ranging from `Min Value` to `Max Value` inclusive.
-
-
-### Player Brain
-
-The Brain property settings must match the Agent implementation. For example, if you specify that the Brain use the **Continuous State Space** and a **State Size** of 23, then the Agent must provide a state vector with 23 elements. See [Agents](Learning-Environment-Design-Agents.md) for more information about programming agents.


--- a/docs/Learning-Environment-Design-CoreBrains.md
+++ b/docs/Learning-Environment-Design-CoreBrains.md

 When the Brain creates an instance of your CoreBrain, it adds the enum name to the string, "CoreBrain". Thus, the class name for the Internal brain is `CoreBrainInternal`. If you created a class named, `CoreBrainFuzzyLogic`, you would add an enum named, "FuzzyLogic", to the BrainType enum.

+<!--
-Once you have determined that the existing CoreBrain implementations do not fill your needs, you can implement your own. Use `SendState()` to collect the observations from your agents and store them for use in `DecideAction()`.
+Once you have determined that the existing CoreBrain implementations do not fill your needs, you can implement your own. Use `SendState()` to collect the observations from your agents and store them for use in `DecideAction()`.
+-->
--- a/docs/Learning-Environment-Design-Player-Brains.md
+++ b/docs/Learning-Environment-Design-Player-Brains.md
 # Player Brain

+The **Player** brain type allows you to control an agent using keyboard commands. You can use Player brains to control a "teacher" agent that trains other agents during [imitation learning](Training-Imitation-Learning.md). You can also use Player brains to test your agents and environment before changing their brain types to **External** and running the training process.
+
+The **Player** brain properties allow you to assign one or more keyboard keys to each action and a unique value to send when a key is pressed.
-If the action space is discrete, you must map input keys to their corresponding integer values. If the action space is continuous, you must map input keys to their corresponding indices and float values.
+Note the differences between the discrete and continuous action spaces. When a brain uses the discrete action space, you can send one integer value as the action per step. In contrast, when a brain uses the continuous action space you can send any number of floating point values (up to the **Vector Action Space Size** setting).
+ 
+| **Property** |    | **Description** |
+| :--                  |:-- | :--                       |
+|**Continuous Player Actions**|| The mapping for the continuous vector action space. Shown when the action space is **Continuous**|. 
+|| **Size** | The number of key commands defined. You can assign more than one command to the same action index in order to send different values for that action. (If you press both keys at the same time, deterministic results are not guarenteed.)|
+||**Element 0–N**| The mapping of keys to action values. |
+|| **Key** | The key on the keyboard. |
+|| **Index** | The element of the agent's action vector to set when this key is pressed. The index value cannot exceed the size of the Action Space (minus 1, since it is an array index).|
+|| **Value** | The value to send to the agent as its action for the specified index when the mapped key is pressed. All other members of the action vector are set to 0. |
+|**Discrete Player Actions**|| The mapping for the discrete vector action space. Shown when the action space is **Discrete**.| 
+|| **Default Action** | The value to send when no keys are pressed.|
+|| **Size** | The number of key commands defined. |
+||**Element 0–N**| The mapping of keys to action values. |
+|| **Key** | The key on the keyboard. |
+|| **Value** | The value to send to the agent as its action when the mapped key is pressed.|
+
+For more information about the Unity input system, see [Input](https://docs.unity3d.com/ScriptReference/Input.html).
+
--- a/docs/Learning-Environment-Design.md
+++ b/docs/Learning-Environment-Design.md

 Training and simulation proceed in steps orchestrated by the ML-Agents Academy class. The Academy works with Agent and Brain objects in the scene to step through the simulation. When either the Academy has reached its maximum number of steps or all agents in the scene are _done_, one training episode is finished. 

-During training, the external Python PPO process communicates with the Academy to run a series of episodes while it collects data and optimizes its neural network model. The type of Brain assigned to an agent determines whether it participates in training or not. The **External** brain communicates with the external process to train the TensorFlow model. When training is completed successfully, you can add the trained model file to your Unity project for use with an **Internal** brain.
+During training, the external Python training process communicates with the Academy to run a series of episodes while it collects data and optimizes its neural network model. The type of Brain assigned to an agent determines whether it participates in training or not. The **External** brain communicates with the external process to train the TensorFlow model. When training is completed successfully, you can add the trained model file to your Unity project for use with an **Internal** brain.

 The ML-Agents Academy class orchestrates the agent simulation loop as follows:

--- a/docs/Learning-Environment-Examples.md
+++ b/docs/Learning-Environment-Examples.md

 ![Hallway](images/hallway.png)

-* Set-up: Environment where the agent needs to find information in a room, remeber it, and use it to move to the correct goal.
+* Set-up: Environment where the agent needs to find information in a room, remember it, and use it to move to the correct goal.
 * Goal: Move to the goal which corresponds to the color of the block in the room.
 * Agents: The environment contains one agent linked to a single brain.
 * Agent Reward Function (independent):
--- a/docs/Python-API.md
+++ b/docs/Python-API.md
 # Python API

+ML-Agents provides a Python API for controlling the agent simulation loop of a environment or game built with Unity. This API is used by the ML-Agent training algorithms (run with `learn.py`), but you can also write your Python programs using this API. 
+
+The key objects in the Python API include:
+
+* **UnityEnvironment** — the main interface between the Unity application and your code. Use UnityEnvironment to start and control a simulation or training session.
+* **BrainInfo** — contains all the data from agents in the simulation, such as observations and rewards.
+* **BrainParameters** — describes the data elements in a BrainInfo object. For example, provides the array length of an observation in BrainInfo.
+
+These classes are all defined in the `python/unityagents` folder of the ML-Agents SDK.
+
+To communicate with an agent in a Unity environment from a Python program, the agent must either use an **External** brain or use a brain that is broadcasting (has its **Broadcast** property set to true). Your code is expected to return actions for agents with external brains, but can only observe broadcasting brains (the information you receive for an agent is the same in both cases). See [Using the Broadcast Feature](Feature-Broadcast.md).
+
+For a simple example of using the Python API to interact with a Unity environment, see the Basic [Jupyter](Background-Jupyter.md) notebook, which opens an environment, runs a few simulation steps taking random actions, and closes the environment. 
+
-Python-side communication happens through `UnityEnvironment` which is located in `python/unityagents`. To load a Unity environment from a built binary file, put the file in the same directory as `unityagents`. If your filename is 3DBall.app, in python, run:
+Python-side communication happens through `UnityEnvironment` which is located in `python/unityagents`. To load a Unity environment from a built binary file, put the file in the same directory as `unityagents`. For example, if the filename of your Unity environment is 3DBall.app, in python, run:

 ```python
 from unityagents import UnityEnvironment
 * **`agents`** : A list of the unique ids of the agents using the brain.
 * **`previous_actions`** : A two dimensional numpy array of dimension `(batch size, vector action size)` if the vector action space is continuous and `(batch size, 1)` if the vector action space is discrete.

-Once loaded, `env` can be used in the following way:  
+Once loaded, you can use your UnityEnvironment object, which referenced by a variable named `env` in this example, can be used in the following way:  
 - **Print : `print(str(env))`**  
 Prints all parameters relevant to the loaded environment and the external brains.  
 - **Reset : `env.reset(train_model=True, config=None)`**  
--- a/docs/Training-ML-Agents.md
+++ b/docs/Training-ML-Agents.md
 # Training ML-Agents

-ML-Agents conducts training using an external Python training process. During training, this external process communicates with the Academy object in the Unity scene to generate a block of agent experiences. These experiences become the training set for a neural network used to optimize the agent's policy (which is essentially a mathematical function mapping observations to actions). In reinforcement learning, the neural network optimizes the policy by maximizing the expected rewards. In imitation learning, the neural network optimizes the policy to achieve the smallest difference between the actions chosen by the agent trainee and the imitated actions. 
+ML-Agents conducts training using an external Python training process. During training, this external process communicates with the Academy object in the Unity scene to generate a block of agent experiences. These experiences become the training set for a neural network used to optimize the agent's policy (which is essentially a mathematical function mapping observations to actions). In reinforcement learning, the neural network optimizes the policy by maximizing the expected rewards. In imitation learning, the neural network optimizes the policy to achieve the smallest difference between the actions chosen by the agent trainee and the actions chosen by the expert in the same situation. 

 The output of the training process is a model file containing the optimized policy. This model file is a TensorFlow data graph containing the mathematical operations and the optimized weights selected during the training process. You can use the generated model file with the Internal Brain type in your Unity project to decide the best course of action for an agent. 


 ## Training with Learn.py

-Use the Python `Learn.py` program to train agents. `Learn.py` supports training with [reinforcement learning](Background-Machine-Learning.md#reinforcement-learning), [curriculum learning](Training-Curriculum-Learning.md), and [behavioural cloning imitation learning](link).
+Use the Python `Learn.py` program to train agents. `Learn.py` supports training with [reinforcement learning](Background-Machine-Learning.md#reinforcement-learning), [curriculum learning](Training-Curriculum-Learning.md), and [behavioural cloning imitation learning](Training-Imitation-Learning.md).

 Run `Learn.py` from the command line to launch the training process. Use the command line patterns and the `trainer_config.yaml` file to control training options.

 In addition to passing the path of the Unity executable containing your training environment, you can set the following commandline options when invoking `learn.py`:

 * `--curriculum=<file>` – Specify a curriculum json file for defining the lessons for curriculum training. See [Curriculum Training](Training-Curriculum-Learning.md) for more information.
-* `--keep-checkpoints=<n>` – Specify the maximum number of model checkpoints to keep. Checkpoints are saved after the number of steps specified by the `save-freq` option. Once the maximum number of checkpoints has been reached, the oldest checkpoint is overwritten. Defaults to 5.
+* `--keep-checkpoints=<n>` – Specify the maximum number of model checkpoints to keep. Checkpoints are saved after the number of steps specified by the `save-freq` option. Once the maximum number of checkpoints has been reached, the oldest checkpoint is deleted when saving a new checkpoint. Defaults to 5.
-* `--load` – If set, the training code loads an already trained model to initialize the neural network before training. A trained model must exist. The learning code looks for the model in `python/models/<run-id>/` (which is also where it saves models at the end of training). When not set (the default), the neural network weigths are randomly initialized and an existing model is not loaded.
-* `--run-id=<path>` – Specifies an identifier for each training run. This identifier is used to name the subdirectories in which the trained model and summary statistics are saved. The default id is "ppo". If you use TensorBoard to view the training statistics, always set a unique run-id for each training run. (Otherwise, the statistics for all runs with the same id are all mashed together.)
+* `--load` – If set, the training code loads an already trained model to initialize the neural network before training. The learning code looks for the model in `python/models/<run-id>/` (which is also where it saves models at the end of training). When not set (the default), the neural network weigths are randomly initialized and an existing model is not loaded.
+* `--run-id=<path>` – Specifies an identifier for each training run. This identifier is used to name the subdirectories in which the trained model and summary statistics are saved as well as the saved model itself. The default id is "ppo". If you use TensorBoard to view the training statistics, always set a unique run-id for each training run. (The statistics for all runs with the same id are combined as if they were produced by a the same session.)
 * `--save-freq=<n>` Specifies how often (in  steps) to save the model during training. Defaults to 50000.
 * `--seed=<n>` – Specifies a number to use as a seed for the random number generator used by the training code.
 * `--slow` – Specify this option to run the Unity environment at normal, game speed. The `--slow` mode uses the **Time Scale** and **Target Frame Rate** specified in the Academy's **Inference Configuration**. By default, training runs using the speeds specified in your Academy's **Training Configuration**. See [Academy Properties](Learning-Environment-Design-Academy.md#academy-properties).

 The training config file, `trainer_config.yaml` specifies the training method, the hyperparameters, and a few additional values to use during training. The file is divided into sections. The **default** section defines the default values for all the available settings. You can also add new sections to override these defaults to train specific Brains. Name each of these override sections after the GameObject containing the Brain component that should use these settings. (This GameObject will be a child of the Academy in your scene.) Sections for the example environments are included in the provided config file. `Learn.py` finds the config file by name and looks for it in the same directory as itself.

-| ** Setting ** | **Description** |
-| :--               | :--                     |
-| batch_size | The number of experiences in each iteration of gradient descent.|
-| batches_per_epoch | In imitation learning, the number of batches of training examples to collect before training the model.|
-| beta | the strength of entropy regularization.|
-| brain_to_imitate | For imitation learning, the name of the GameObject containing the Brain component to imitate. |
-| buffer_size | The number of experiences to collect before  |
-| epsilon | Influences how rapidly the policy can evolve during training.|
-| gamma | The reward discount rate for the Generalized Advantage Estimater (GAE). |
-| hidden_units | The number of units in the hidden layers of the neural network. |
-| lambd | The regularization parameter. |
-| learning_rate | The initial learning rate for gradient descent. |
-| max_steps | The maximum number of simulation steps to run during a training session. |
-| memory_size |  |
-| normalize | Whether to automatically normalize observations. |
-| num_epoch | The number of passes to makethrough the experience buffer when performing gradient descent optimization. |
-| num_layers | The number of hidden layers in the neural network. |
-| sequence_length | |
-| summary_freq | How often, in steps, to save training statistics. This determines the number of data points shown by Tensorboard. |
-| time_horizon | h|
-| trainer | c|
-| use_recurrent |c|
+| ** Setting ** | **Description** | **Applies To Trainer**|
+| :--                | :--                       | :--                                  |
+| batch_size | The number of experiences in each iteration of gradient descent.| PPO, BC |
+| batches_per_epoch | In imitation learning, the number of batches of training examples to collect before training the model.| BC |
+| beta | The strength of entropy regularization.| PPO, BC |
+| brain_to_imitate | For imitation learning, the name of the GameObject containing the Brain component to imitate. | BC |
+| buffer_size | The number of experiences to collect before updating the policy model. | PPO, BC |
+| epsilon | Influences how rapidly the policy can evolve during training.| PPO, BC |
+| gamma | The reward discount rate for the Generalized Advantage Estimater (GAE).  | PPO  |
+| hidden_units | The number of units in the hidden layers of the neural network. | PPO, BC |
+| lambd | The regularization parameter. | PPO  |
+| learning_rate | The initial learning rate for gradient descent. | PPO,  BC |
+| max_steps | The maximum number of simulation steps to run during a training session. | PPO, BC |
+| memory_size | The size of the memory an agent must keep. Used for training with a recurrent neural network. See [Using Recurrent Neural Networks in ML-Agents](Feature-Memory.md). | PPO, BC |
+| normalize | Whether to automatically normalize observations. | PPO, BC |
+| num_epoch | The number of passes to makethrough the experience buffer when performing gradient descent optimization. | PPO, BC |
+| num_layers | The number of hidden layers in the neural network. | PPO, BC |
+| sequence_length | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks in ML-Agents](Feature-Memory.md). | PPO, BC |
+| summary_freq | How often, in steps, to save training statistics. This determines the number of data points shown by Tensorboard. | PPO, BC |
+| time_horizon | How many steps of experience to collect per-agent before adding it to the experience buffer. | PPO, BC |
+| trainer | The type of training to perform: "ppo" or "imitation".| PPO, BC |
+| use_recurrent | Train using a recurrent neural network. See [Using Recurrent Neural Networks in ML-Agents](Feature-Memory.md).| PPO, BC |
+|| PPO = Proximal Policy Optimization, BC = Behavioral Cloning (Imitation)) ||
-## Observing Training Progress
+For specific advice on setting hyperparameters based on the type of training you are conducting, see:
-Once you start training using `learn.py` in the way described in the previous section, the `ml-agents` folder will 
-contain a `summaries` directory. In order to observe the training process 
-in more detail, you can use TensorBoard. From the command line run :
-
-`tensorboard --logdir=summaries`
-
-Then navigate to `localhost:6006`.
-
-From TensorBoard, you will see the summary statistics:
-
-* Lesson - only interesting when performing
-[curriculum training](Training-Curriculum-Learning.md). 
-This is not used in the 3d Balance Ball environment. 
-* Cumulative Reward - The mean cumulative episode reward over all agents. 
-Should increase during a successful training session.
-* Entropy - How random the decisions of the model are. Should slowly decrease 
-during a successful training process. If it decreases too quickly, the `beta` 
-hyperparameter should be increased.
-* Episode Length - The mean length of each episode in the environment for all 
-agents.
-* Learning Rate - How large a step the training algorithm takes as it searches 
-for the optimal policy. Should decrease over time.
-* Policy Loss - The mean loss of the policy function update. Correlates to how
-much the policy (process for deciding actions) is changing. The magnitude of 
-this should decrease during a successful training session.
-* Value Estimate - The mean value estimate for all states visited by the agent. 
-Should increase during a successful training session.
-* Value Loss - The mean loss of the value function update. Correlates to how
-well the model is able to predict the value of each state. This should decrease
-during a successful training session.
+* [Training with PPO](Training-PPO.md)
+* [Using Recurrent Neural Networks in ML-Agents](Feature-Memory.md)
+* [Imitation Learning](Training-Imitation-Learning.md)
+* [Training with Curriculum Learning](Training-Curriculum-Learning.md)
-![Example TensorBoard Run](images/mlagents-TensorBoard.png)
+You can also compare the [example environments](Learning-Environment-Examples.md) to the corresponding sections of the `trainer-config.yaml` file for each example to see how the hyperparameters and other configuration variables have been changed from the defaults.
--- a/docs/Training-PPO.md
+++ b/docs/Training-PPO.md
 # Training with Proximal Policy Optimization

-This section is still to be written. Refer to [Getting Started with the Balance Ball Environment](Getting-Started-with-Balance-Ball.md) for a walk-through of the PPO training process.
+ML-Agents uses a reinforcement learning technique called [Proximal Policy Optimization (PPO)](https://blog.openai.com/openai-baselines-ppo/). PPO uses a neural network to approximate the ideal function that maps an agent's observations to the best action an agent can take in a given state. The ML-Agents PPO algorithm is implemented in TensorFlow and runs in a separate Python process (communicating with the running Unity application over a socket). 
+
+See [Training ML-Agents](Training ML-Agents.md) for instructions on running the training program, `learn.py`.
+
+If you are using the recurrent neural network (RNN) to utilize memory, see [Using Recurrent Neural Networks in ML-Agents](Feature-Memory.md) for RNN-specific training details.
+
+If you are using curriculum training to pace the difficulty of the learning task presented to an agent, see [Training with Curriculum Learning](Training-Curriculum-Learning.md).
+
+For information about imitation learning, which uses a different training algorithm, see [Imitation Learning](Training-Imitation-Learning).
+
+<!-- Need a description of PPO that provides a general overview of the algorithm and, more specifically, puts all the hyperparameters and Academy/Brain/Agent settings (like max_steps and done) into context. Oh, and which is also short and understandable by laymen. -->
-The process of training a Reinforcement Learning model can often involve the need to tune the hyperparameters in order to achieve
-a level of performance that is desirable. This guide contains some best practices for tuning the training process when the default
-parameters don't seem to be giving the level of performance you would like.
+Successfully training a Reinforcement Learning model often involves tuning the training hyperparameters. This guide contains some best practices for tuning the training process when the default parameters don't seem to be giving the level of performance you would like.

 ### Hyperparameters

--- a/docs/Using-TensorFlow-Sharp-in-Unity.md
+++ b/docs/Using-TensorFlow-Sharp-in-Unity.md
 # Using TensorFlowSharp in Unity _[Experimental]_

-Unity now offers the possibility to use pre-trained Tensorflow graphs inside of the game engine. This was made possible thanks to [the TensorFlowSharp project](https://github.com/migueldeicaza/TensorFlowSharp).
+ML-Agents allows you to use pre-trained [Tensorflow graphs](https://www.tensorflow.org/programmers_guide/graphs) inside your Unity games. This support is possible thanks to [the TensorFlowSharp project](https://github.com/migueldeicaza/TensorFlowSharp). The primary purpose for this support is to use the TensorFlow models produced by the ML-Agents own training programs, but a side benefit is that you can use any TensorFlow model.

 _Notice: This feature is still experimental. While it is possible to embed trained models into Unity games, Unity Technologies does not officially support this use-case for production games at this time. As such, no guarantees are provided regarding the quality of experience. If you encounter issues regarding battery life, or general performance (especially on mobile), please let us know._


 * Unity 2017.1 or above
 * Unity Tensorflow Plugin ([Download here](https://s3.amazonaws.com/unity-agents/0.2/TFSharpPlugin.unitypackage))
+
-In order to bring a fully trained agent back into Unity, you will need to make sure the nodes of your graph have appropriate names. You can give names to nodes in Tensorflow :
+Go to `Edit` -> `Player Settings` and add `ENABLE_TENSORFLOW` to the `Scripting Define Symbols` for each type of device you want to use (**`PC, Mac and Linux Standalone`**, **`iOS`** or **`Android`**).
+
+Set the Brain you used for training to `Internal`. Drag `your_name_graph.bytes` into Unity and then drag it into The `Graph Model` field in the Brain. 
+
+## Using your own trained graphs
+
+The TensorFlow data graphs produced by the ML-Agents traing programs work without any additional settings.
+
+In order to use a TensorFlow data graph in Unity, make sure the nodes of your graph have appropriate names. You can assign names to nodes in Tensorflow :
+
-We recommend using the following naming convention:
+
+We recommend using the following naming conventions:
 * Name the batch size input placeholder `batch_size`
 * Name the input vector observation placeholder `state`
 * Name the output node `action`

-You can have additional placeholders for float or integers but they must be placed in placeholders of dimension 1 and size 1. (Be sure to name them)
+You can have additional placeholders for float or integers but they must be placed in placeholders of dimension 1 and size 1. (Be sure to name them.)
-It is important that the inputs and outputs of the graph are exactly the one you receive / give when training your model with an `External` brain. This means you cannot have any operations such as reshaping outside of the graph.
+It is important that the inputs and outputs of the graph are exactly the ones you receive and return when training your model with an `External` brain. This means you cannot have any operations such as reshaping outside of the graph.
-While training your Agent using the Python API, you can save your graph at any point of the training. Note that the argument `output_node_names` must be the name of the tensor your graph outputs (separated by a coma if multiple outputs). In this case, it will be either `action` or `action,recurrent_out` if you have recurrent outputs.
+While training your Agent using the Python API, you can save your graph at any point of the training. Note that the argument `output_node_names` must be the name of the tensor your graph outputs (separated by a coma if using multiple outputs). In this case, it will be either `action` or `action,recurrent_out` if you have recurrent outputs.
+
 ```python
 from tensorflow.python.tools import freeze_graph

              clear_devices = True, initializer_nodes = "",input_saver = "",
              restore_op_name = "save/restore_all", filename_tensor_name = "save/Const:0")
 ```
-Your model will be saved with the name `your_name_graph.bytes` and will contain both the graph and associated weights. Note that you must save your graph as a bytes file so Unity can load it.
+
+Your model will be saved with the name `your_name_graph.bytes` and will contain both the graph and associated weights. Note that you must save your graph as a .bytes file so Unity can load it.
-## Inside Unity
+In the Unity Editor, you must specify the names of the nodes used by your graph in the **Internal** brain Inspector window. If you used a scope when defining your graph, specify it in the `Graph Scope` field. 
+
+![Internal Brain Inspector](images/internal_brain.png)
-Go to `Edit` -> `Player Settings` and add `ENABLE_TENSORFLOW` to the `Scripting Define Symbols` for each type of device you want to use (**`PC, Mac and Linux Standalone`**, **`iOS`** or **`Android`**).
+See [Internal Brain](Learning-Environments-Internal-Brains.md) for more information about using the InternalBrain object.
-Set the Brain you used for training to `Internal`. Drag `your_name_graph.bytes` into Unity and then drag it into The `Graph Model` field in the Brain. If you used a scope when training you graph, specify it in the `Graph Scope` field. Specify the names of the nodes you used in your graph. If you followed these instructions well, the agents in your environment that use this brain will use you fully trained network to make decisions.
+If you followed these instructions well, the agents in your environment that use this brain will use your fully trained network to make decisions.

 # iOS additional instructions for building


 # Using TensorFlowSharp without ML-Agents

-Beyond controlling an in-game agent, you may desire to use TensorFlowSharp for more general computation. The below instructions describe how to generally embed Tensorflow models without using the ML-Agents framework.
+Beyond controlling an in-game agent, you can also use TensorFlowSharp for more general computation. The following instructions describe how to generally embed Tensorflow models without using the ML-Agents framework.
-You must have a Tensorflow graph `your_name_graph.bytes` made using Tensorflow's `freeze_graph.py`. The process to create such graph is explained above.
+You must have a Tensorflow graph, such as `your_name_graph.bytes`, made using Tensorflow's `freeze_graph.py`. The process to create such graph is explained in[Using your own trained graphs](#using-your-own-trained-graphs).
-Put the file `your_name_graph.bytes` into Resources.
+To load and use a TensorFlow data graph in Unity:
-In your C# script :
-At the top, add the line
-```csharp
-using TensorFlow;
-```
-If you will be building for android, you must add this block at the start of your code :
-```csharp
-#if UNITY_ANDROID
-TensorFlowSharp.Android.NativeBinding.Init();
-#endif
-```
-Put your graph as a text asset in the variable `graphModel`. You can do so in the inspector by making `graphModel` a public variable and dragging you asset in the inspector or load it from the Resources folder :
-```csharp
-TextAsset graphModel = Resources.Load (your_name_graph) as TextAsset;
-```
-You then must recreate the graph in Unity by adding the code :
-```csharp
-graph = new TFGraph ();
-graph.Import (graphModel.bytes);
-session = new TFSession (graph);
-```
-Your newly created graph need to get input tensors. Here is an example with a one dimensional tensor of size 2:
+1. Put the file, `your_name_graph.bytes`, into Resources.
+
+2. At the top off your C# script, add the line:
+
+        ```csharp
+        using TensorFlow;
+        ```
+
+3. If you will be building for android, you must add this block at the start of your code :
+
+        ```csharp
+        #if UNITY_ANDROID
+        TensorFlowSharp.Android.NativeBinding.Init();
+        #endif
+        ```
+
+4. Load your graph as a text asset into a variable, such as `graphModel`:
+
+        ```csharp
+        TextAsset graphModel = Resources.Load (your_name_graph) as TextAsset;
+        ```
+
+5. You then must instantiate the graph in Unity by adding the code :
+
+        ```csharp
+        graph = new TFGraph ();
+        graph.Import (graphModel.bytes);
+        session = new TFSession (graph);
+        ```
+
+6. Assign the input tensors for the graph. For example, the following code assigns a one dimensional input tensor of size 2:
+
+        ```csharp
+        var runner = session.GetRunner ();
+        runner.AddInput (graph ["input_placeholder_name"] [0], new float[]{ placeholder_value1, placeholder_value2 });
+        ```
+
+    You must provide all required inputs to the graph. Supply one input per TensorFlow placeholder.
+
+7. To calculate and access the output of your graph, run the following code.
-```csharp
-var runner = session.GetRunner ();
-runner.AddInput (graph ["input_placeholder_name"] [0], new float[]{ placeholder_value1, placeholder_value2 });
-```
-You need to give all required inputs to the graph. There is one input per TensorFlow placeholder.
+        ```csharp
+        runner.Fetch (graph["output_placeholder_name"][0]);
+        float[,] recurrent_tensor = runner.Run () [0].GetValue () as float[,];
+        ```
-To retrieve the output of your graph run the following code. Note that this is for an output that would be a two dimensional tensor of floats. Cast to a long array if your outputs are integers.
-```csharp
-runner.Fetch (graph["output_placeholder_name"][0]);
-float[,] recurrent_tensor = runner.Run () [0].GetValue () as float[,];
-```
+     Note that this example assumes the output array is a two-dimensional tensor of floats. Cast to a long array if your outputs are integers.
--- a/docs/Using-Tensorboard.md
+++ b/docs/Using-Tensorboard.md
 # Using TensorBoard to Observe Training

-This document is still to be written. It will discuss using TensorBoard and interpreting the TensorBoard charts.
+ML-Agents saves statistics during learning session that you can view with a TensorFlow utility named, [TensorBoard](https://www.tensorflow.org/programmers_guide/summaries_and_tensorboard).
+
+The `learn.py` program saves training statistics to a folder named `summaries`, organized by the `run-id` value you assign to a training session.
+
+In order to observe the training process, either during training or afterward, 
+start TensorBoard:
+
+1. Open a terminal or console window:
+2. Navigate to the ml-agents/python folder.
+3. From the command line run :
+
+        tensorboard --logdir=summaries
+
+4. Open a browser window and navigate to [localhost:6006](http://localhost:6006).
+
+**Note:** If you don't assign a `run-id` identifier, `learn.py` uses the default string, "ppo". All the statistics will be saved to the same sub-folder and displayed as one session in TensorBoard. After a few runs, the displays can become difficult to interprete in this situation. You can delete the folders under the `summaries` directory to clear out old statistics.
+
+On the left side of the TensorBoard window, you can select which of the training runs you want to display. You can select multiple run-ids to compare statistics. The TensorBoard window also provides options for how to display and smooth graphs.
+ 
+When you run the training program, `learn.py`, you can use the `--save-freq` option to specify how frequently to save the statistics.
+
+## ML-Agents training statistics
+
+The ML-agents training program saves the following statistics:
+
+* Lesson - Plots the progress from lesson to lesson. Only interesting when performing
+[curriculum training](Training-Curriculum-Learning.md). 
+
+* Cumulative Reward - The mean cumulative episode reward over all agents. 
+Should increase during a successful training session.
+
+* Entropy - How random the decisions of the model are. Should slowly decrease 
+during a successful training process. If it decreases too quickly, the `beta` 
+hyperparameter should be increased.
+
+* Episode Length - The mean length of each episode in the environment for all 
+agents.
+
+* Learning Rate - How large a step the training algorithm takes as it searches 
+for the optimal policy. Should decrease over time.
+
+* Policy Loss - The mean loss of the policy function update. Correlates to how
+much the policy (process for deciding actions) is changing. The magnitude of 
+this should decrease during a successful training session.
+
+* Value Estimate - The mean value estimate for all states visited by the agent. 
+Should increase during a successful training session.
+
+* Value Loss - The mean loss of the value function update. Correlates to how
+well the model is able to predict the value of each state. This should decrease
+during a successful training session.
+
--- a/docs/dox-ml-agents.conf
+++ b/docs/dox-ml-agents.conf
 # Doxyfile 1.8.13

-# This file describes the settings to be used by the documentation system
-# doxygen (www.doxygen.org) for a project.
+# To generate the C# API documentation, run:
+# 
+# doxygen dox-ml-agents.conf
-# All text after a double hash (##) is considered a comment and is placed in
-# front of the TAG it is preceding.
-#
-# All text after a single hash (#) is considered a comment and will be ignored.
-# The format is:
-# TAG = value [value, ...]
-# For lists, items can also be appended using:
-# TAG += value [value, ...]
-# Values that contain spaces should be placed between quotes (\" \").
+# from the ml-agents-docs directory

 #---------------------------------------------------------------------------
 # Project related configuration options
--- a/docs/Learning-Environment-Design-External-Internal-Brains.md
+++ b/docs/Learning-Environment-Design-External-Internal-Brains.md
+# External and Internal Brains
+
+The **External** and **Internal** types of Brains work in different phases of training. When training your agents, set their brain types to **External**; when using the trained models, set their brain types to **Internal**.
+
+## External Brain
+
+When [running an ML-Agents training algorithm](Training-ML-Agents.md), at least one Brain object in a scene must be set to **External**. This allows the training process to collect the observations of agents using that brain and give the agents their actions.
+
+In addition to using an External brain for training using the ML-Agents learning algorithms, you can use an External brain to control agents in a Unity environment using an external Python program. See [Python API](Python-API.md) for more information.
+
+Unlike the other types, the External Brain has no properties to set in the Unity Inspector window.
+
+## Internal Brain
+
+The Internal Brain type uses a [TensorFlow model](https://www.tensorflow.org/get_started/get_started_for_beginners#models_and_training) to make decisions. The Proximal Policy Optimization (PPO) and Behavioral Cloning algorithms included with the ML-Agents SDK produce trained TensorFlow models that you can use with the Internal Brain type.
+
+A __model__ is a mathematical relationship mapping an agent's observations to its actions. TensorFlow is a software library for performing numerical computation through data flow graphs. A TensorFlow model, then, defines the mathematical relationship between your agent's observations and its actions using a TensorFlow data flow graph. 
+
+### Creating a graph model
+
+The training algorithma included in the ML-Agents SDK produce TensorFlow graph models as the end result of the training process. See [Training ML-Agents](Training-ML-Agents.md) for instructions on how to train a model.
+
+### Using a graph model
+
+To use a graph model:
+
+1. Select the Brain GameObject in the **Hierarchy** window of the Unity Editor. (The Brain GameObject must be a child of the Academy Gameobject and must have a Brain component.)
+2. Set the **Brain Type** to **Internal**.
+
+    **Note:** In order to see the **Internal** Brain Type option, you must [enable TensorFlowSharp](Using-TensorFlow-Sharp-in-Unity.md).  
+
+3. Import the `environment_run-id.bytes` file produced by the PPO training program. (Where `environment_run-id` is the name of the model file, which is constructed from the name of your Unity environment executable and the run-id value you assigned when running the training process.)
+
+    You can [import assets into Unity](https://docs.unity3d.com/Manual/ImportingAssets.html) in various ways. The easiest way is to simply drag the file into the **Project** window and drop it into an appropriate folder.
+    
+4. Once the `environment.bytes` file is imported, drag it from the **Project** window to the **Graph Model** field of the Brain component.
+
+If you are using a model produced by the ML-Agents `learn.py` program, use the default values for the other Internal Brain parameters.
+
+### Internal Brain properties
+
+The default values of the TensorFlow graph parameters work with the model produced by the PPO and BC training code in the ML-Agents SDK. To use a default ML-Agents model, the only parameter that you need to set is the `Graph Model`, which must be set to the .bytes file containing the trained model itself. 
+
+![Internal Brain Inspector](images/internal_brain.png)
+
+
+   *  `Graph Model` : This must be the `bytes` file corresponding to the pretrained Tensorflow graph. (You must first drag this file into your Resources folder and then from the Resources folder into the inspector)
+
+Only change the following Internal Brain properties if you have created your own TensorFlow model and are not using an ML-Agents model:
+
+   *  `Graph Scope` : If you set a scope while training your TensorFlow model, all your placeholder name will have a prefix. You must specify that prefix here.
+   *  `Batch Size Node Name` : If the batch size is one of the inputs of your graph, you must specify the name if the placeholder here. The brain will make the batch size equal to the number of agents connected to the brain automatically.
+   *  `State Node Name` : If your graph uses the state as an input, you must specify the name of the placeholder here.
+   *  `Recurrent Input Node Name` : If your graph uses a recurrent input / memory as input and outputs new recurrent input / memory, you must specify the name if the input placeholder here.
+   *  `Recurrent Output Node Name` : If your graph uses a recurrent input / memory as input and outputs new recurrent input / memory, you must specify the name if the output placeholder here.
+   * `Observation Placeholder Name` : If your graph uses observations as input, you must specify it here. Note that the number of observations is equal to the length of `Camera Resolutions` in the brain parameters.
+   * `Action Node Name` : Specify the name of the placeholder corresponding to the actions of the brain in your graph. If the action space type is continuous, the output must be a one dimensional tensor of float of length `Action Space Size`, if the action space type is discrete, the output must be a one dimensional tensor of int of length 1.
+   * `Graph Placeholder` : If your graph takes additional inputs that are fixed (example: noise level) you can specify them here. Note that in your graph, these must correspond to one dimensional tensors of int or float of size 1.
+     * `Name` : Corresponds to the name of the placeholder.
+     * `Value Type` : Either Integer or Floating Point.
+     * `Min Value` and `Max Value` : Specify the range of the value here. The value will be sampled from the uniform distribution ranging from `Min Value` to `Max Value` inclusive.
+
--- a/docs/Learning-Environment-Heuristic-Brains.md
+++ b/docs/Learning-Environment-Heuristic-Brains.md
+# Heuristic Brain
+
--- a/docs/Learning-Environment-Design-Internal-Brains.md
+++ b/docs/Learning-Environment-Design-Internal-Brains.md
-# Internal Brain
-
-The Internal Brain type uses a [TensorFlow model](https://www.tensorflow.org/get_started/get_started_for_beginners#models_and_training) to make decisions. The Proximal Policy Optimization (PPO) algorithm included with the ML-Agents SDK produces a trained TensorFlow model that you can use with the Internal Brain type.
-
-A __model__ is a mathematical relationship mapping an agent's observations to its actions. TensorFlow is a software library for performing numerical computation through data flow graphs. A TensorFlow model, then, defines the mathematical relationship between your agent's observations and its actions using a TensorFlow data flow graph. 
-
-## Creating a graph model
-
-The PPO algorithm included with the ML-Agents SDK produces a TensorFlow graph model as the end result of the training process. See [Training with Proximal Policy Optimization](Training-PPO.md) for intructions on how to create and train a model using PPO.
-
-## Using a graph model
-
-To use a graph model:
-
-1. Select the Brain GameObject in the **Hierarchy** window of the Unity Editor. (The Brain GameObject must be a child of the Academy Gameobject and must have a Brain component.)
-2. Set the **Brain Type** to **Internal**.
-
-    **Note:** In order to see the **Internal** Brain Type option, you must [enable TensorFlowSharp](link).  
-
-3. Import the `environment_run-id.bytes` file produced by the PPO training program. (Where `environment_run-id` is the name of the model file, which is constructed from the name of your Unity environment executable and the run-id value you assigned when running the training process.)
-
-    You can [import assets into Unity](https://docs.unity3d.com/Manual/ImportingAssets.html) in various ways. The easiest way is to simply drag the file into the **Project** window and drop it into an appropriate folder.
-    
-4. Once the `environment.bytes` file is imported, drag it from the **Project** window to the **Graph Model** field of the Brain component.
-
-If you are using a model produced by the ML-Agents PPO program, use the default values for the other Internal Brain parameters.
-
-## Internal Brain properties
-
-The default values of the TensorFlow graph parameters work with the model produced by the PPO training code in the ML-Agents SDK. To use a default PPO model, the only parameter that you need to set is the `Graph Model`, which must be set to the .bytes file containing the trained model itself. 
-
-![Internal Brain Inspector](images/internal_brain.png)
-
-
-   *  `Graph Model` : This must be the `bytes` file corresponding to the pretrained Tensorflow graph. (You must first drag this file into your Resources folder and then from the Resources folder into the inspector)
-
-Only change the following Internal Brain properties if you have created your own TensorFlow model and are not using the ML-Agents PPO model:
-
-   *  `Graph Scope` : If you set a scope while training your TensorFlow model, all your placeholder name will have a prefix. You must specify that prefix here.
-   *  `Batch Size Node Name` : If the batch size is one of the inputs of your graph, you must specify the name if the placeholder here. The brain will make the batch size equal to the number of agents connected to the brain automatically.
-   *  `State Node Name` : If your graph uses the state as an input, you must specify the name of the placeholder here.
-   *  `Recurrent Input Node Name` : If your graph uses a recurrent input / memory as input and outputs new recurrent input / memory, you must specify the name if the input placeholder here.
-   *  `Recurrent Output Node Name` : If your graph uses a recurrent input / memory as input and outputs new recurrent input / memory, you must specify the name if the output placeholder here.
-   * `Observation Placeholder Name` : If your graph uses observations as input, you must specify it here. Note that the number of observations is equal to the length of `Camera Resolutions` in the brain parameters.
-   * `Action Node Name` : Specify the name of the placeholder corresponding to the actions of the brain in your graph. If the action space type is continuous, the output must be a one dimensional tensor of float of length `Action Space Size`, if the action space type is discrete, the output must be a one dimensional tensor of int of length 1.
-   * `Graph Placeholder` : If your graph takes additional inputs that are fixed (example: noise level) you can specify them here. Note that in your graph, these must correspond to one dimensional tensors of int or float of size 1.
-     * `Name` : Corresponds to the name of the placeholder.
-     * `Value Type` : Either Integer or Floating Point.
-     * `Min Value` and `Max Value` : Specify the range of the value here. The value will be sampled from the uniform distribution ranging from `Min Value` to `Max Value` inclusive.
-