Documentation Update (#1339)

* Documentation Update * addressed comments * new images for the recorder * Improvements to the docs * Address the comments * Core_ML typo * Updated the links to inference repo * Put back Inference-Engine.md * fix typos : brain * Readd deleted file * fix typos * Addressed comments
6 年前 · bd4a8db2
--- a/README.md
+++ b/README.md
 * For more information, in addition to installation and usage instructions, see
  our [documentation home](docs/Readme.md).
 * If you are a researcher interested in a discussion of Unity as an AI platform, see a pre-print of our [reference paper on Unity and the ML-Agents Toolkit](https://arxiv.org/abs/1809.02627). Also, see below for instructions on citing this paper.
-* If you have used a version of the ML-Agents toolkit prior to v0.6, we strongly
+* If you have used an earlier version of the ML-Agents toolkit, we strongly
  recommend our [guide on migrating from earlier versions](docs/Migrating.md).

 ## Additional Resources
--- a/docs/Basic-Guide.md
+++ b/docs/Basic-Guide.md

 ## Setting up the ML-Agents Toolkit within Unity

-In order to use the ML-Agents toolkit within Unity, you need to change some
-Unity settings first. You will also need to have appropriate inference backends
-installed in order to run your models inside of Unity. See [here](Inference-Engine.md)
-for more information.
+In order to use the ML-Agents toolkit within Unity, you first need to change a few
+Unity settings. 

 1. Launch Unity
 2. On the Projects dialog, choose the **Open** option at the top of the window.
 5. For **each** of the platforms you target (**PC, Mac and Linux Standalone**,
   **iOS** or **Android**):
-    1. Option the **Other Settings** section.
+    1. Expand the **Other Settings** section.
+## Setting up the Inference Engine
+
+We provide pre-trained models for all the agents in all our demo environments. 
+To be able to run those models, you'll first need to set-up the Inference 
+Engine. The Inference Engine is a general API to
+run neural network models in Unity that leverages existing inference libraries such 
+as TensorFlowSharp and Apple's Core ML. Since the ML-Agents Toolkit uses TensorFlow 
+for training neural network models, the output model format is TensorFlow and 
+the model files include a `.tf` extension. Consequently, you need to install 
+the TensorFlowSharp backend to be able to run these models within the Unity 
+Editor. You can find instructions 
+on how to install the TensorFlowSharp backend [here](Inference-Engine.md).
+Once the backend is installed, you will need to reimport the models : Right click
+on the `.tf` model and select `Reimport`.
+
+
 ## Running a Pre-trained Model

 1. In the **Project** window, go to `Assets/ML-Agents/Examples/3DBall/Scenes` folder
-3. In the `Ball 3D Agent` Component: Drag the **3DBallLearning** located into 
+3. In the `Ball 3D Agent` Component: Drag the **3DBallLearning** Brain located in 
-   __Note__ : You can modify multiple game objects in a scene by selecting them all at once using the search bar in the Scene Hierarchy. 
+   __Note__ : You can modify multiple game objects in a scene by selecting them all at 
+   once using the search bar in the Scene Hierarchy. 
-6. Drag the `3DBall` model file from the `Assets/ML-Agents/Examples/3DBall/TFModels` 
-   folder to the **Model** field of the **3DBallLearning**.
+6. Drag the `3DBallLearning` model file from the `Assets/ML-Agents/Examples/3DBall/TFModels` 
+   folder to the **Model** field of the **3DBallLearning** Brain.
 7. Click the **Play** button and you will see the platforms balance the balls
   using the pretrained model.


 ### Adding a Brain to the training session

-Since we are going to build this environment to conduct training, we need to add 
-the Brain to the training session. This allows the Agents linked to that Brain
-to communicate with the external training process when making their decisions.
+To set up the environment for training, you will need to specify which agents are contributing
+to the training and which Brain is being trained. You can only perform training with
+a `Learning Brain`.
-1. Assign the **3DBallLearning** to the agents you would like to train and the **3DBallPlayer** Brain to the agents you want to control manually. 
-   __Note:__ You can only perform training with an `Learning Brain`.
+1. Assign the **3DBallLearning** Brain to the agents you would like to train.  
+   __Note:__ You can assign the same Brain to multiple agents at once : To do so, you can
+   use the prefab system. When an agent is created from a prefab, modifying the prefab 
+   will modify the agent as well. If the agent does not synchronize with the prefab, you
+   can hit the Revert button on top of the Inspector.
+   Alternatively, you can select multiple agents in the scene and modify their `Brain`
+   property all at once.
+   
+__Note:__ Assigning a Brain to an agent (dragging a Brain into the `Brain` property of 
+the agent) means that the Brain will be making decision for that agent. Whereas dragging
+a Brain into the Broadcast Hub means that the Brain will be exposed to the Python process.
+The `Control` checkbox means that in addition to being exposed to Python, the Brain will
+be controlled by the Python process (required for training).

 ![Set Brain to External](images/mlagents-SetBrainToTrain.png)

--- a/docs/FAQ.md
+++ b/docs/FAQ.md

 ## Cannot drag Model into Learning Brain

-You migh not have the appropriate backend required to import the model. Refer to the 
+You might not have the appropriate backend required to import the model. Refer to the 
 [Inference Engine](Inference-Engine.md) for more information on how to import backends
 and reimport the asset.

--- a/docs/Getting-Started-with-Balance-Ball.md
+++ b/docs/Getting-Started-with-Balance-Ball.md
 The Academy object for the scene is placed on the Ball3DAcademy GameObject. When
 you look at an Academy component in the inspector, you can see several
 properties that control how the environment works. 
-The **Broadcast Hub** keeps track of which brains will send data during training,
-If a brain is added to the hub, his data will be sent to the external training
+The **Broadcast Hub** keeps track of which Brains will send data during training,
+If a Brain is added to the hub, his data will be sent to the external training
-control the agents linked to the brain to train them.
+control the agents linked to the Brain to train them.
 The **Training** and **Inference Configuration** properties 
 set the graphics and timescale properties for the Unity application. 
 The Academy uses the **Training Configuration**  during training and the
 ### Brain

 Brains are assets that exist in your project folder. The Ball3DAgents are connected
-to a brain, for example : the **3DBallLearning**.
+to a Brain, for example : the **3DBallLearning**.
 A Brain doesn't store any information about an Agent, it just
 routes the Agent's collected observations to the decision making process and
 returns the chosen action to the Agent. Thus, all Agents can share the same
-You can create brain objects by selecting `Assets -> 
-Create -> ML-Agents -> Brain`. There are 3 kinds of brains :
-The **Learning Brain** is a brain that uses a Neural Network to take decisions.
+You can create Brain objects by selecting `Assets -> 
+Create -> ML-Agents -> Brain`. There are 3 kinds of Brains :
+The **Learning Brain** is a Brain that uses a Neural Network to take decisions.
 When the Brain is checked as `Control` in the Academy **Broadcast Hub**, the 
 external process will be taking decisions for the agents
 and generate a neural network when the training is over. You can also use the
 The `--train` flag tells the ML-Agents toolkit to run in training mode.

 **Note**: You can train using an executable rather than the Editor. To do so,
-follow the intructions in
+follow the instructions in
 [Using an Executable](Learning-Environment-Executable.md).

 ### Observing Training Progress
--- a/docs/Learning-Environment-Best-Practices.md
+++ b/docs/Learning-Environment-Best-Practices.md
 ## General

 * It is often helpful to start with the simplest version of the problem, to
-  ensure the agent can learn it. From there increase complexity over time. This
+  ensure the agent can learn it. From there, increase complexity over time. This
  can either be done manually, or via Curriculum Learning, where a set of
  lessons which progressively increase in difficulty are presented to the agent
  ([learn more here](Training-Curriculum-Learning.md)).
--- a/docs/Learning-Environment-Create-New.md
+++ b/docs/Learning-Environment-Create-New.md
 ## Add Brains

 The Brain object encapsulates the decision making process. An Agent sends its
-observations to its Brain and expects a decision in return. The type of the brain
+observations to its Brain and expects a decision in return. The type of the Brain
-1. Go to `Assets -> Create -> ML-Agents` and select the type of brain you want to
+1. Go to `Assets -> Create -> ML-Agents` and select the type of Brain you want to
   create. In this tutorial, we will create a **Learning Brain** and 
   a **Player Brain**.
 2. Name them `RollerBallBrain` and `RollerBallPlayer` respectively.
 setting the Brain properties so that they are compatible with our Agent code.

 1. In the Academy Inspector, add the `RollerBallBrain` and `RollerBallPlayer`
-   brains to the **Broadcast Hub**.
+   Brains to the **Broadcast Hub**.
 2. Select the RollerAgent GameObject to show its properties in the Inspector
   window.
 3. Drag the Brain `RollerBallPlayer` from the Project window to the 
 Also, drag the Target GameObject from the Hierarchy window to the RollerAgent
 Target field.

-Finally, select the the `RollerBallBrain` and `RollerBallPlayer` brains assets
+Finally, select the the `RollerBallBrain` and `RollerBallPlayer` Brain assets
 so that you can edit their properties in the Inspector window. Set the following 
 properties on both of them:

 ## Testing the Environment

 It is always a good idea to test your environment manually before embarking on
-an extended training run. The reason we have created the `RollerBallPlayer` brain
+an extended training run. The reason we have created the `RollerBallPlayer` Brain
 is so that we can control the Agent using direct keyboard
 control. But first, you need to define the keyboard to action mapping. Although
 the RollerAgent only has an `Action Size` of two, we will use one key to specify
-1. Select the `RollerBallPlayer` brain to view its properties in the Inspector.
+1. Select the `RollerBallPlayer` Brain to view its properties in the Inspector.
-   a player brain).
+   a **PlayerBrain**).
 3. Set **Size** to 4.
 4. Set the following mappings:

--- a/docs/Learning-Environment-Design-Academy.md
+++ b/docs/Learning-Environment-Design-Academy.md
 ## Academy Properties

 ![Academy Inspector](images/academy.png)
-* `Broadcast Hub` - Gathers the brains that will communicate with the external 
-  process. Any brain added to the Broadcast Hub will be visible from the external
-  process. In addition, if the checkbox `Control` is checked, the brain will be 
+* `Broadcast Hub` - Gathers the Brains that will communicate with the external 
+  process. Any Brain added to the Broadcast Hub will be visible from the external
+  process. In addition, if the checkbox `Control` is checked, the Brain will be 
  controllable from the external process and will thus be trainable.
 * `Max Steps` - Total number of steps per-episode. `0` corresponds to episodes
  without a maximum number of steps. Once the step counter reaches maximum, the
--- a/docs/Learning-Environment-Design-Agents.md
+++ b/docs/Learning-Environment-Design-Agents.md
 that you can use the same Brain in multiple Agents. How a Brain makes its
 decisions depends on the kind of Brain it is. A Player Brain allows you
 to directly control the agent. A Heuristic Brain allows you to create a 
-decision script to control the agent with a set of rules. These two brains
+decision script to control the agent with a set of rules. These two Brains
 do not involve neural networks but they can be useful for debugging. The
 Learning Brain allows you to train and use neural network models for
 your Agents. See [Brains](Learning-Environment-Design-Brains.md).
--- a/docs/Learning-Environment-Design-Brains.md
+++ b/docs/Learning-Environment-Design-Brains.md
 can also create several Brains, attach each of the Brain to one or more than one
 Agent.

-There are 3 kinds of brains you can use:
+There are 3 kinds of Brains you can use:
-* [Learning](Learning-Environment-Learning-Brains.md) – Use a
+* [Learning](Learning-Environment-Design-Learning-Brains.md) – Use a
  **LearningBrain** to make use of a trained model or train a new model.
 * [Heuristic](Learning-Environment-Design-Heuristic-Brains.md) – Use a
  **HeuristicBrain** to hand-code the Agent's logic by extending the Decision class.
    * `Action Descriptions` - A list of strings used to name the available
      actions for the Brain.

-The other properties of the brain depend on the type of Brain you are using.
+The other properties of the Brain depend on the type of Brain you are using.

 ## Using the Broadcast Feature

--- a/docs/Learning-Environment-Design.md
+++ b/docs/Learning-Environment-Design.md

 To train and use the ML-Agents toolkit in a Unity scene, the scene must contain
 a single Academy subclass and as many Agent subclasses
-as you need. The brain assets are present in the project and should be grouped 
+as you need. The Brain assets are present in the project and should be grouped 
 together and named according to the type of agents they are compatible with.
 Agent instances should be attached to the GameObject representing that Agent.


 The Brain encapsulates the decision making process. Every Agent must be
 assigned a Brain, but you can use the same Brain with more than one Agent.
-__Note__:You can assign the same brain to multiple agents by using prefabs
-or by selecting all the agents you want to attach the brain to using the 
+__Note__:You can assign the same Brain to multiple agents by using prefabs
+or by selecting all the agents you want to attach the Brain to using the 
-type of brain you want to use. During training, use a **Learning Brain** 
+type of Brain you want to use. During training, use a **Learning Brain** 
-different types of Brains. You can create new kinds of brains if the three
+different types of Brains. You can create new kinds of Brains if the three
 built-in don't do what you need.

 The Brain class has several important properties that you can set using the
--- a/docs/Learning-Environment-Examples.md
+++ b/docs/Learning-Environment-Examples.md

 * Set-up: A platforming environment where the agent can push a block around.
 * Goal: The agent must push the block to the goal.
-* Agents: The environment contains one agent linked to a single brain.
+* Agents: The environment contains one agent linked to a single Brain.
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
  * Vector Observation space: (Continuous) 70 variables corresponding to 14
    ray-casts each detecting one of three possible objects (wall, goal, or
    block).
 ![Reacher](images/reacher.png)

 * Set-up: Double-jointed arm which can move to target locations.
-* Goal: The agents must move it's hand to the goal location, and keep it there.
+* Goal: The agents must move its hand to the goal location, and keep it there.
 * Agents: The environment contains 10 agent linked to a single Brain.
 * Agent Reward Function (independent):
  * +0.1 Each step agent's hand is in goal location.
--- a/docs/Learning-Environment-Executable.md
+++ b/docs/Learning-Environment-Executable.md
 ![3DBall Scene](images/mlagents-Open3DBall.png)

 Make sure the Brains in the scene have the right type. For example, if you want
-to be able to control your agents from Python, you will need to put the brain
+to be able to control your agents from Python, you will need to put the Brain
 controlling the Agents to be a **Learning Brain** and drag it into the
 Academy's `Broadcast Hub` with the `Control`  checkbox checked.

--- a/docs/ML-Agents-Overview.md
+++ b/docs/ML-Agents-Overview.md

 As mentioned previously, the ML-Agents toolkit ships with several
 implementations of state-of-the-art algorithms for training intelligent agents.
-In this mode, the only brain used is a **Learning Brain**. More 
+In this mode, the only Brain used is a **Learning Brain**. More 
 specifically, during training, all the medics in the
 scene send their observations to the Python API through the External
 Communicator (this is the behavior with an External Brain). The Python API
  observations for all its Agents to the Python API when dragged into the
  Academy's `Broadcast Hub` with the `Control` checkbox checked. This is helpful
  for training and later inference. Broadcasting is a feature which can be 
-  enabled all types of brains (Player, Learning, Heuristic) where the Agent
+  enabled all types of Brains (Player, Learning, Heuristic) where the Agent
  observations and actions are also sent to the Python API (despite the fact
  that the Agent is **not** controlled by the Python API). This feature is
  leveraged by Imitation Learning, where the observations and actions for a
--- a/docs/Migrating.md
+++ b/docs/Migrating.md
 # Migrating

 ## Migrating from ML-Agents toolkit v0.5 to v0.6
+
-* Brains are now Scriptable Objects instead of MonoBehaviors. This will
-  allow you to set Brains into prefabs and use the same brains across 
-  scenes.
+
+* Brains are now Scriptable Objects instead of MonoBehaviors. 
+* You can no longer modify the type of a Brain. If you want to switch
+  between `PlayerBrain` and `LearningBrain` for multiple agents,
+  you will need to assign a new Brain to each agent separately.
+  __Note:__ You can pass the same Brain to multiple agents in a scene by 
+leveraging Unity's prefab system or look for all the agents in a scene
+using the search bar of the `Hierarchy` window with the word `Agent`.
-  *  Remove the `Brain` GameObjects in the scene
+  *  Remove the `Brain` GameObjects in the scene. (Delete all of the 
+  Brain GameObjects under Academy in the scene.)
-     ML-Agents`
+     ML-Agents` for each type of the Brain you plan to use, and put
+     the created files under a folder called Brains within your project.
-     in the `Brain` GameObjects
+     in the `Brain` GameObjects.
-  appropriate Brain asset in it.
+  appropriate Brain ScriptableObject in it.
-__Note:__ You can pass the same brain to multiple agents in a scene by 
-leveraging Unity's prefab system or look for all the agents in a scene
-using the search bar of the `Hierarchy` window with the word `Agent`.
+  __Note:__ You will need to delete the previous TensorFlowSharp package 
+  and install the new one to do inference. To correctly delete the previous 
+  TensorFlowSharp package, Delete all of the files under `ML-Agents/Plugins`
+  folder except the files under `ML-Agents/Plugins/ProtoBuffer`.
+  
+* We replaced the **Internal** and **External** Brain with **Learning Brain**.
+  When you need to train a model, you need to drag it into the `Training Hub`
+  inside the `Academy` and check the `Control` checkbox.
+* We removed the `Broadcast` checkbox of the Brain, to use the broadcast 
+  functionality, you need to drag the Brain into the `Broadcast Hub`.
+* When training multiple Brains at the same time, each model is now stored 
+  into a separate model file rather than in the same file under different
+  graph scopes. 
+* We have changed the way ML-Agents models perform inference. All previous `.bytes`
+  files can no longer be used (you will have to retrain them). The models
+  produced by the training process and the shipped models have now a `.tf` 
+  extension and use TensorflowSharp as a backend for the 
+  [Inference Engine](Inference-Engine.md).
+* To use a `.tf` model, drag it inside the `Model` property of the `Learning Brain`
+
+

 ## Migrating from ML-Agents toolkit v0.4 to v0.5

    [curriculum learning documentation](Training-Curriculum-Learning.md)
    for detailed information. In summary:
  * Curriculum files for the same environment must now be placed into a folder.
-    Each curriculum file should be named after the brain whose curriculum it
+    Each curriculum file should be named after the Brain whose curriculum it
    specifies.
  * `min_lesson_length` now specifies the minimum number of episodes in a lesson
    and affects reward thresholding.
--- a/docs/Training-Imitation-Learning.md
+++ b/docs/Training-Imitation-Learning.md
 6. Launch `mlagent-learn`, and providing `./config/offline_bc_config.yaml` as the config parameter, and your environment as the `--env` parameter.
 7. (Optional) Observe training performance using Tensorboard.

-This will use the demonstration file to train a nerual network driven agent to directly imitate the actions provided in the demonstration. The environment will launch and be used for evaluating the agent's performance during training.
+This will use the demonstration file to train a neural network driven agent to directly imitate the actions provided in the demonstration. The environment will launch and be used for evaluating the agent's performance during training.

 ### Online Training

   will be the "Student." We will assume that the names of the Brain
-   `Assets`s are "Teacher" and "Student" respectively.
+   Assets are "Teacher" and "Student" respectively.
-4. The Brain Parameters of both the "Teacher" and "Student" brains must be 
+4. The Brain Parameters of both the "Teacher" and "Student" Brains must be 
-5. Drag both the "Teacher" and "Student" brain into the Academy's `Broadcast Hub`
-   and check the `Control` checkbox on the "Student" brain. 
+5. Drag both the "Teacher" and "Student" Brain into the Academy's `Broadcast Hub`
+   and check the `Control` checkbox on the "Student" Brain. 
 4. Link the Brains to the desired Agents (one Agent as the teacher and at least
   one Agent as a student).
 5. In `config/online_bc_config.yaml`, add an entry for the "Student" Brain. Set
--- a/docs/Training-ML-Agents.md
+++ b/docs/Training-ML-Agents.md
 | brain\_to\_imitate   | For online imitation learning, the name of the GameObject containing the Brain component to imitate.                                                                                    | (online)BC               |
 | demo_path            | For offline imitation learning, the file path of the recorded demonstration file                                                                                                        | (offline)BC              |
 | buffer_size          | The number of experiences to collect before updating the policy model.                                                                                                                  | PPO                      |
-| curiosity\_enc\_size | The size of the encoding to use in the forward and inverse models in the Curioity module.                                                                                               | PPO                      |
+| curiosity\_enc\_size | The size of the encoding to use in the forward and inverse models in the Curiosity module.                                                                                               | PPO                      |
 | curiosity_strength   | Magnitude of intrinsic reward generated by Intrinsic Curiosity Module.                                                                                                                  | PPO                      |
 | epsilon              | Influences how rapidly the policy can evolve during training.                                                                                                                           | PPO                      |
 | gamma                | The reward discount rate for the Generalized Advantage Estimator (GAE).                                                                                                                 | PPO                      |
--- a/docs/Training-PPO.md
+++ b/docs/Training-PPO.md

 The below hyperparameters are only used when `use_curiosity` is set to true.

-### Curioisty Encoding Size
+### Curiosity Encoding Size

 `curiosity_enc_size` corresponds to the size of the hidden layer used to encode
 the observations within the intrinsic curiosity module. This value should be

 `curiosity_strength` corresponds to the magnitude of the intrinsic reward
 generated by the intrinsic curiosity module. This should be scaled in order to
-ensure it is large enough to not be overwhelmed by extrnisic reward signals in
+ensure it is large enough to not be overwhelmed by extrinsic reward signals in
 the environment. Likewise it should not be too large to overwhelm the extrinsic
 reward signal.

--- a/docs/Training-on-Microsoft-Azure.md
+++ b/docs/Training-on-Microsoft-Azure.md
 ## Pre-Configured Azure Virtual Machine

 A pre-configured virtual machine image is available in the Azure Marketplace and
-is nearly compltely ready for training.  You can start by deploying the
+is nearly completely ready for training.  You can start by deploying the
 [Data Science Virtual Machine for Linux (Ubuntu)](https://azuremarketplace.microsoft.com/marketplace/apps/microsoft-ads.linux-data-science-vm-ubuntu)
 into your Azure subscription.  Once your VM is deployed, SSH into it and run the
 following command to complete dependency installation:
 ```

 Where `<your_app>` is the path to your app (i.e.
-`~/unity-volume/3DBallHeadless`) and `<run_id>` is an identifer you would like
+`~/unity-volume/3DBallHeadless`) and `<run_id>` is an identifier you would like
 to identify your training run with.

 If you've selected to run on a N-Series VM with GPU support, you can verify that
--- a/docs/images/demo_component.png
+++ b/docs/images/demo_component.png
--- a/docs/images/demo_inspector.png
+++ b/docs/images/demo_inspector.png