Updating docs with new paths.

6 年前 · 709141f4
--- a/docs/API-Reference.md
+++ b/docs/API-Reference.md
 # API Reference

-Our developer-facing C# classes (Academy, Agent, Decision and 
-Monitor) have been documented to be compatabile with 
-[Doxygen](http://www.stack.nl/~dimitri/doxygen/) for auto-generating HTML 
+Our developer-facing C# classes (Academy, Agent, Decision and Monitor) have been
+documented to be compatabile with
+[Doxygen](http://www.stack.nl/~dimitri/doxygen/) for auto-generating HTML
-To generate the API reference, 
-[download Doxygen](http://www.stack.nl/~dimitri/doxygen/download.html) and run
-the following command within the `docs/` directory:
+To generate the API reference, [download
+Doxygen](http://www.stack.nl/~dimitri/doxygen/download.html) and run the
+following command within the `docs/` directory:
-that includes the classes that have been properly formatted. 
-The generated HTML files will be placed
-in the `html/` subdirectory. Open `index.html` within that subdirectory to 
-navigate to the API reference home. Note that `html/` is already included in 
-the repository's `.gitignore` file.
+that includes the classes that have been properly formatted. The generated HTML
+files will be placed in the `html/` subdirectory. Open `index.html` within that
+subdirectory to navigate to the API reference home. Note that `html/` is already
+included in the repository's `.gitignore` file.
-In the near future, we aim to expand our documentation
-to include all the Unity C# classes and Python API.
+In the near future, we aim to expand our documentation to include all the Unity
+C# classes and Python API.
--- a/docs/Background-Jupyter.md
+++ b/docs/Background-Jupyter.md
 # Background: Jupyter

-[Jupyter](https://jupyter.org) is a fantastic tool for writing code with 
-embedded visualizations. We provide one such notebook, `python/notebooks/getting-started.ipynb`, 
-for testing the Python control interface to a Unity build. This notebook is 
-introduced in the 
-[Getting Started with the 3D Balance Ball Environment](Getting-Started-with-Balance-Ball.md)
+[Jupyter](https://jupyter.org) is a fantastic tool for writing code with
+embedded visualizations. We provide one such notebook,
+`notebooks/getting-started.ipynb`, for testing the Python control
+interface to a Unity build. This notebook is introduced in the [Getting Started
+with the 3D Balance Ball Environment](Getting-Started-with-Balance-Ball.md)
-in the _Jupyter/IPython Quick Start Guide_. To launch Jupyter, run in the command line:
+in the _Jupyter/IPython Quick Start Guide_. To launch Jupyter, run in the
+command line:
-`jupyter notebook`
+    jupyter notebook

 Then navigate to `localhost:8888` to access your notebooks.
--- a/docs/Basic-Guide.md
+++ b/docs/Basic-Guide.md
 # Basic Guide

-This guide will show you how to use a pretrained model in an example Unity environment, and show you how to train the model yourself.
+This guide will show you how to use a pretrained model in an example Unity
+environment, and show you how to train the model yourself.
-If you are not familiar with the [Unity Engine](https://unity3d.com/unity),
-we highly recommend the [Roll-a-ball tutorial](https://unity3d.com/learn/tutorials/s/roll-ball-tutorial) to learn all the basic concepts of Unity. 
+If you are not familiar with the [Unity Engine](https://unity3d.com/unity), we
+highly recommend the [Roll-a-ball
+tutorial](https://unity3d.com/learn/tutorials/s/roll-ball-tutorial) to learn all
+the basic concepts of Unity.
-In order to use the ML-Agents toolkit within Unity, you need to change some Unity settings first. Also [TensorFlowSharp plugin](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage) is needed for you to use pretrained model within Unity, which is based on the [TensorFlowSharp repo](https://github.com/migueldeicaza/TensorFlowSharp). 
+In order to use the ML-Agents toolkit within Unity, you need to change some
+Unity settings first. Also [TensorFlowSharp
+plugin](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage)
+is needed for you to use pretrained model within Unity, which is based on the
+[TensorFlowSharp repo](https://github.com/migueldeicaza/TensorFlowSharp).
-3. Using the file dialog that opens, locate the `unity-environment` folder within the the ML-Agents toolkit project and click **Open**.
+3. Using the file dialog that opens, locate the `MLAgentsSDK` folder
+   within the the ML-Agents toolkit project and click **Open**.
-5. For **each** of the platforms you target 
-(**PC, Mac and Linux Standalone**, **iOS** or **Android**):
+5. For **each** of the platforms you target (**PC, Mac and Linux Standalone**,
+   **iOS** or **Android**):
-    2. Select **Scripting Runtime Version** to 
-    **Experimental (.NET 4.6 Equivalent or .NET 4.x Equivalent)**
-    3. In **Scripting Defined Symbols**, add the flag `ENABLE_TENSORFLOW`. 
-    After typing in the flag name, press Enter.
+    2. Select **Scripting Runtime Version** to **Experimental (.NET 4.6
+       Equivalent or .NET 4.x Equivalent)**
+    3. In **Scripting Defined Symbols**, add the flag `ENABLE_TENSORFLOW`. After
+       typing in the flag name, press Enter.
-[Download](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage) the TensorFlowSharp plugin. Then import it into Unity by double clicking the downloaded file.  You can check if it was successfully imported by checking the TensorFlow files in the Project window under **Assets** > **ML-Agents** > **Plugins** > **Computer**. 
+[Download](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage)
+the TensorFlowSharp plugin. Then import it into Unity by double clicking the
+downloaded file.  You can check if it was successfully imported by checking the
+TensorFlow files in the Project window under **Assets** > **ML-Agents** >
+**Plugins** > **Computer**.
-**Note**: If you don't see anything under **Assets**, drag the `ml-agents/unity-environment/Assets/ML-Agents` folder under **Assets** within Project window.
+**Note**: If you don't see anything under **Assets**, drag the
+`ml-agents/MLAgentsSDK/Assets/ML-Agents` folder under **Assets** within
+Project window.
-1. In the **Project** window, go to `Assets/ML-Agents/Examples/3DBall` folder and open the `3DBall` scene file. 
-2. In the **Hierarchy** window, select the **Ball3DBrain** child under the **Ball3DAcademy** GameObject to view its properties in the Inspector window.
-3. On the **Ball3DBrain** object's **Brain** component, change the **Brain Type** to **Internal**.
-4. In the **Project** window, locate the `Assets/ML-Agents/Examples/3DBall/TFModels` folder.
-5. Drag the `3DBall` model file from the `TFModels` folder to the **Graph Model** field of the **Ball3DBrain** object's **Brain** component.
-5. Click the **Play** button and you will see the platforms balance the balls using the pretrained model.
+1. In the **Project** window, go to `Assets/ML-Agents/Examples/3DBall` folder
+   and open the `3DBall` scene file.
+2. In the **Hierarchy** window, select the **Ball3DBrain** child under the
+   **Ball3DAcademy** GameObject to view its properties in the Inspector window.
+3. On the **Ball3DBrain** object's **Brain** component, change the **Brain
+   Type** to **Internal**.
+4. In the **Project** window, locate the
+   `Assets/ML-Agents/Examples/3DBall/TFModels` folder.
+5. Drag the `3DBall` model file from the `TFModels` folder to the **Graph
+   Model** field of the **Ball3DBrain** object's **Brain** component.
+6. Click the **Play** button and you will see the platforms balance the balls
+   using the pretrained model.
-The `python/Basics` [Jupyter notebook](Background-Jupyter.md) contains a 
-simple walkthrough of the functionality of the Python 
-API. It can also serve as a simple test that your environment is configured
-correctly. Within `Basics`, be sure to set `env_name` to the name of the 
-Unity executable if you want to [use an executable](Learning-Environment-Executable.md) or to `None` if you want to interact with the current scene in the Unity Editor.
+The `notebooks/getting-started.ipynb` [Jupyter notebook](Background-Jupyter.md)
+contains a simple walkthrough of the functionality of the Python API. It can
+also serve as a simple test that your environment is configured correctly.
+Within `Basics`, be sure to set `env_name` to the name of the Unity executable
+if you want to [use an executable](Learning-Environment-Executable.md) or to
+`None` if you want to interact with the current scene in the Unity Editor.
+
-Since we are going to build this environment to conduct training, we need to 
-set the brain used by the agents to **External**. This allows the agents to 
+
+Since we are going to build this environment to conduct training, we need to set
+the brain used by the agents to **External**. This allows the agents to
-1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy 
-object.
+1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy
+   object.
 2. Select its child object **Ball3DBrain**.
 3. In the Inspector window, set **Brain Type** to **External**.

-1. Open a command or terminal window. 
-2. Nagivate to the folder where you installed the ML-Agents toolkit. 
-3. Change to the `python` directory. 
-4. Run `python3 learn.py --run-id=<run-identifier> --train`
-Where:
- `<run-identifier>` is a string used to separate the results of different training runs
- And the `--train` tells learn.py to run a training session (rather than inference)
-5. When the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen, you can press the :arrow_forward: button in Unity to start training in the Editor.
-**Note**: Alternatively, you can use an executable rather than the Editor to perform training. Please refer to [this page](Learning-Environment-Executable.md) for instructions on how to build and use an executable.
+1. Open a command or terminal window.
+2. Nagivate to the folder where you installed the ML-Agents toolkit.
+3. Run `learn <trainer-config-path> --run-id=<run-identifier> --train` Where:
+    - `<trainer-config-path>` is the relative or absolute filepath of the
+      trainer configuration. The defaults used by environments in the ML-Agents
+      SDK can be found in `trainer_config.yaml`.
+    - `<run-identifier>` is a string used to separate the results of different
+      training runs
+    - And the `--train` tells learn.py to run a training session (rather than
+      inference)
+5. When the message _"Start training by pressing the Play button in the Unity
+   Editor"_ is displayed on the screen, you can press the :arrow_forward: button
+   in Unity to start training in the Editor.
+
+**Note**: Alternatively, you can use an executable rather than the Editor to
+perform training. Please refer to [this
+page](Learning-Environment-Executable.md) for instructions on how to build and
+use an executable.
-**Note**: If you're using Anaconda, don't forget to activate the ml-agents environment first.
+**Note**: If you're using Anaconda, don't forget to activate the ml-agents
+environment first.
-If the learn.py runs correctly and starts training, you should see something like this:
+If the learn.py runs correctly and starts training, you should see something
+like this:
-You can press Ctrl+C to stop the training, and your trained model will be at `ml-agents/python/models/<run-identifier>/editor_<academy_name>_<run-identifier>.bytes` where `<academy_name>` is the name of the Academy GameObject in the current scene. This file corresponds to your model's latest checkpoint. You can now embed this trained model into your internal brain by following the steps below, which is similar to the steps described [above](#play-an-example-environment-using-pretrained-model).  
-1. Move your model file into 
-`unity-environment/Assets/ML-Agents/Examples/3DBall/TFModels/`.
+You can press Ctrl+C to stop the training, and your trained model will be at
+`models/<run-identifier>/editor_<academy_name>_<run-identifier>.bytes` where
+`<academy_name>` is the name of the Academy GameObject in the current scene.
+This file corresponds to your model's latest checkpoint. You can now embed this
+trained model into your internal brain by following the steps below, which is
+similar to the steps described
+[above](#play-an-example-environment-using-pretrained-model).
+
+1. Move your model file into
+   `MLAgentsSDK/Assets/ML-Agents/Examples/3DBall/TFModels/`.
-5. Drag the `<env_name>_<run-identifier>.bytes` file from the Project window of the Editor
-to the **Graph Model** placeholder in the **Ball3DBrain** inspector window.
+5. Drag the `<env_name>_<run-identifier>.bytes` file from the Project window of
+   the Editor to the **Graph Model** placeholder in the **Ball3DBrain**
+   inspector window.
-* For more information on the ML-Agents toolkit, in addition to helpful background, check out the [ML-Agents Toolkit Overview](ML-Agents-Overview.md) page.
-* For a more detailed walk-through of our 3D Balance Ball environment, check out the [Getting Started](Getting-Started-with-Balance-Ball.md) page.
-* For a "Hello World" introduction to creating your own learning environment, check out the [Making a New Learning Environment](Learning-Environment-Create-New.md) page.
-* For a series of Youtube video tutorials, checkout the [Machine Learning Agents PlayList](https://www.youtube.com/playlist?list=PLX2vGYjWbI0R08eWQkO7nQkGiicHAX7IX) page. 
+- For more information on the ML-Agents toolkit, in addition to helpful
+  background, check out the [ML-Agents Toolkit Overview](ML-Agents-Overview.md)
+  page.
+- For a more detailed walk-through of our 3D Balance Ball environment, check out
+  the [Getting Started](Getting-Started-with-Balance-Ball.md) page.
+- For a "Hello World" introduction to creating your own learning environment,
+  check out the [Making a New Learning
+  Environment](Learning-Environment-Create-New.md) page.
+- For a series of Youtube video tutorials, checkout the [Machine Learning Agents
+  PlayList](https://www.youtube.com/playlist?list=PLX2vGYjWbI0R08eWQkO7nQkGiicHAX7IX)
+  page.
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
 # Frequently Asked Questions

-
-### Scripting Runtime Environment not setup correctly
+## Scripting Runtime Environment not setup correctly
-If you haven't switched your scripting runtime version from .NET 3.5 to .NET 4.6 or .NET 4.x, you will see such error message:
+If you haven't switched your scripting runtime version from .NET 3.5 to .NET 4.6
+or .NET 4.x, you will see such error message:
-This is because .NET 3.5 doesn't support method Clear() for StringBuilder, refer to [Setting Up The ML-Agents Toolkit Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution. 
+This is because .NET 3.5 doesn't support method Clear() for StringBuilder, refer
+to [Setting Up The ML-Agents Toolkit Within
+Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
-### TensorFlowSharp flag not turned on. 
+## TensorFlowSharp flag not turned on
-If you have already imported the TensorFlowSharp plugin, but havn't set ENABLE_TENSORFLOW flag for your scripting define symbols, you will see the following error message:
+If you have already imported the TensorFlowSharp plugin, but havn't set
+ENABLE_TENSORFLOW flag for your scripting define symbols, you will see the
+following error message:
-You need to install and enable the TensorFlowSharp plugin in order to use the internal brain. 
+You need to install and enable the TensorFlowSharp plugin in order to use the internal brain.
-This error message occurs because the TensorFlowSharp plugin won't be usage without the ENABLE_TENSORFLOW flag, refer to [Setting Up The ML-Agents Toolkit Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution. 
+This error message occurs because the TensorFlowSharp plugin won't be usage
+without the ENABLE_TENSORFLOW flag, refer to [Setting Up The ML-Agents Toolkit
+Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
-### Tensorflow epsilon placeholder error
+## Tensorflow epsilon placeholder error
-If you have a graph placeholder set in the internal Brain inspector that is not present in the TensorFlow graph, you will see some error like this:
+If you have a graph placeholder set in the internal Brain inspector that is not
+present in the TensorFlow graph, you will see some error like this:
-UnityAgentsException: One of the Tensorflow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>. 
+UnityAgentsException: One of the Tensorflow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>.
-Solution: Go to all of your Brain object, find `Graph placeholders` and change its `size` to 0 to remove the `epsilon` placeholder. 
+Solution: Go to all of your Brain object, find `Graph placeholders` and change
+its `size` to 0 to remove the `epsilon` placeholder.
-Similarly, if you have a graph scope set in the internal Brain inspector that is not correctly set, you will see some error like this:
+Similarly, if you have a graph scope set in the internal Brain inspector that is
+not correctly set, you will see some error like this:
-Solution: Make sure your Graph Scope field matches the corresponding brain object name in your Hierachy Inspector when there is multiple brain. 
+Solution: Make sure your Graph Scope field matches the corresponding brain
+object name in your Hierachy Inspector when there is multiple brain.
-### Environment Permission Error
+## Environment Permission Error
-If you directly import your Unity environment without building it in the 
-editor, you might need to give it additional permissions to execute it. 
+If you directly import your Unity environment without building it in the
+editor, you might need to give it additional permissions to execute it.
-`chmod -R 755 *.app` 
+```shell
+chmod -R 755 *.app
+```
-`chmod -R 755 *.x86_64` 
+```shell
+chmod -R 755 *.x86_64
+```
-On Windows, you can find 
+On Windows, you can find
-### Environment Connection Timeout
+## Environment Connection Timeout
-If you are able to launch the environment from `UnityEnvironment` but 
-then receive a timeout error, there may be a number of possible causes.
- * _Cause_: There may be no Brains in your environment which are set 
- to `External`.  In this case, the environment will not attempt to 
- communicate with python. _Solution_: Set the Brains(s) you wish to 
- externally control through the Python API to `External` from the 
- Unity Editor, and rebuild the environment.
- * _Cause_: On OSX, the firewall may be preventing communication with 
- the environment. _Solution_: Add the built environment binary to the 
- list of exceptions on the firewall by following 
- [instructions](https://support.apple.com/en-us/HT201642). 
- * _Cause_: An error happened in the Unity Environment preventing 
- communication. _Solution_: Look into the 
- [log files](https://docs.unity3d.com/Manual/LogFiles.html) 
- generated by the Unity Environment to figure what error happened. 
+If you are able to launch the environment from `UnityEnvironment` but then
+receive a timeout error, there may be a number of possible causes.
-### Communication port {} still in use
+* _Cause_: There may be no Brains in your environment which are set to
+  `External`.  In this case, the environment will not attempt to communicate
+  with python. _Solution_: Set the Brains(s) you wish to externally control
+  through the Python API to `External` from the Unity Editor, and rebuild the
+  environment.
+* _Cause_: On OSX, the firewall may be preventing communication with the
+  environment. _Solution_: Add the built environment binary to the list of
+  exceptions on the firewall by following
+  [instructions](https://support.apple.com/en-us/HT201642).
+* _Cause_: An error happened in the Unity Environment preventing communication.
+  _Solution_: Look into the [log
+  files](https://docs.unity3d.com/Manual/LogFiles.html) generated by the Unity
+  Environment to figure what error happened.
-If you receive an exception `"Couldn't launch new environment because 
-communication port {} is still in use. "`, you can change the worker 
-number in the Python script when calling 
+## Communication port {} still in use
+
+If you receive an exception `"Couldn't launch new environment because
+communication port {} is still in use. "`, you can change the worker number in
+the Python script when calling
-### Mean reward : nan
+## Mean reward : nan
-If you receive a message `Mean reward : nan` when attempting to train a 
-model using PPO, this is due to the episodes of the learning environment 
-not terminating. In order to address this, set `Max Steps` for either 
-the Academy or Agents within the Scene Inspector to a value greater 
-than 0. Alternatively, it is possible to manually set `done` conditions 
-for episodes from within scripts for custom episode-terminating events.
+If you receive a message `Mean reward : nan` when attempting to train a model
+using PPO, this is due to the episodes of the learning environment not
+terminating. In order to address this, set `Max Steps` for either the Academy or
+Agents within the Scene Inspector to a value greater than 0. Alternatively, it
+is possible to manually set `done` conditions for episodes from within scripts
+for custom episode-terminating events.
--- a/docs/Getting-Started-with-Balance-Ball.md
+++ b/docs/Getting-Started-with-Balance-Ball.md
 # Getting Started with the 3D Balance Ball Environment

-This tutorial walks through the end-to-end process of opening a ML-Agents toolkit
-example environment in Unity, building the Unity executable, training an agent 
-in it, and finally embedding the trained model into the Unity environment. 
+This tutorial walks through the end-to-end process of opening a ML-Agents
+toolkit example environment in Unity, building the Unity executable, training an
+agent in it, and finally embedding the trained model into the Unity environment.
-The ML-Agents toolkit includes a number of [example environments](Learning-Environment-Examples.md) 
-which you can examine to help understand the different ways in which the ML-Agents toolkit 
-can be used. These environments can also serve as templates for new 
-environments or as ways to test new ML algorithms. After reading this tutorial, 
-you should be able to explore and build the example environments.
+The ML-Agents toolkit includes a number of [example
+environments](Learning-Environment-Examples.md) which you can examine to help
+understand the different ways in which the ML-Agents toolkit can be used. These
+environments can also serve as templates for new environments or as ways to test
+new ML algorithms. After reading this tutorial, you should be able to explore
+and build the example environments.
-This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball contains 
-a number of platforms and balls (which are all copies of each other). 
-Each platform tries to keep its ball from falling by rotating either 
-horizontally or vertically. In this environment, a platform is an **agent** 
-that receives a reward for every step that it balances the ball. An agent is 
-also penalized with a negative reward for dropping the ball. The goal of the 
-training process is to have the platforms learn to never drop the ball.
+This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball
+contains a number of platforms and balls (which are all copies of each other).
+Each platform tries to keep its ball from falling by rotating either
+horizontally or vertically. In this environment, a platform is an **agent** that
+receives a reward for every step that it balances the ball. An agent is also
+penalized with a negative reward for dropping the ball. The goal of the training
+process is to have the platforms learn to never drop the ball.
-In order to install and set up the ML-Agents toolkit, the Python dependencies and Unity, 
-see the [installation instructions](Installation.md).
+In order to install and set up the ML-Agents toolkit, the Python dependencies
+and Unity, see the [installation instructions](Installation.md).
-An agent is an autonomous actor that observes and interacts with an 
-_environment_. In the context of Unity, an environment is a scene containing 
-an Academy and one or more Brain and Agent objects, and, of course, the other 
+An agent is an autonomous actor that observes and interacts with an
+_environment_. In the context of Unity, an environment is a scene containing an
+Academy and one or more Brain and Agent objects, and, of course, the other
-**Note:** In Unity, the base object of everything in a scene is the 
-_GameObject_. The GameObject is essentially a container for everything else, 
-including behaviors, graphics, physics, etc. To see the components that make 
-up a GameObject, select the GameObject in the Scene window, and open the 
-Inspector window. The Inspector shows every component on a GameObject. 
- 
-The first thing you may notice after opening the 3D Balance Ball scene is that 
-it contains not one, but several platforms.  Each platform in the scene is an 
-independent agent, but they all share the same brain. 3D Balance Ball does this 
+**Note:** In Unity, the base object of everything in a scene is the
+_GameObject_. The GameObject is essentially a container for everything else,
+including behaviors, graphics, physics, etc. To see the components that make up
+a GameObject, select the GameObject in the Scene window, and open the Inspector
+window. The Inspector shows every component on a GameObject.
+
+The first thing you may notice after opening the 3D Balance Ball scene is that
+it contains not one, but several platforms.  Each platform in the scene is an
+independent agent, but they all share the same brain. 3D Balance Ball does this
-The Academy object for the scene is placed on the Ball3DAcademy GameObject. 
-When you look at an Academy component in the inspector, you can see several 
-properties that control how the environment works. For example, the 
-**Training** and **Inference Configuration**  properties set the graphics and 
-timescale properties for the Unity application. The Academy uses the 
-**Training Configuration**  during training and the **Inference Configuration** 
-when not training. (*Inference* means that the agent is using a trained model 
-or heuristics or direct control — in other words, whenever **not** training.) 
-Typically, you set low graphics quality and a high time scale for the 
-**Training configuration** and a high graphics quality and the timescale to 
-`1.0` for the **Inference Configuration** .
+The Academy object for the scene is placed on the Ball3DAcademy GameObject. When
+you look at an Academy component in the inspector, you can see several
+properties that control how the environment works. For example, the **Training**
+and **Inference Configuration**  properties set the graphics and timescale
+properties for the Unity application. The Academy uses the **Training
+Configuration**  during training and the **Inference Configuration** when not
+training. (*Inference* means that the agent is using a trained model or
+heuristics or direct control — in other words, whenever **not** training.)
+Typically, you set low graphics quality and a high time scale for the **Training
+configuration** and a high graphics quality and the timescale to `1.0` for the
+**Inference Configuration** .
-**Note:** if you want to observe the environment during training, you can 
-adjust the **Inference Configuration** settings to use a larger window and a 
-timescale closer to 1:1. Be sure to set these parameters back when training in 
-earnest; otherwise, training can take a very long time.
+**Note:** if you want to observe the environment during training, you can adjust
+the **Inference Configuration** settings to use a larger window and a timescale
+closer to 1:1. Be sure to set these parameters back when training in earnest;
+otherwise, training can take a very long time.
-Another aspect of an environment to look at is the Academy implementation. 
-Since the base Academy class is abstract, you must always define a subclass. 
-There are three functions you can implement, though they are all optional:
+Another aspect of an environment to look at is the Academy implementation. Since
+the base Academy class is abstract, you must always define a subclass. There are
+three functions you can implement, though they are all optional:
-* Academy.AcademyStep() — Called at every simulation step before 
-Agent.AgentAction() (and after the agents collect their observations).
-* Academy.AcademyReset() — Called when the Academy starts or restarts the 
-simulation (including the first time).
+* Academy.AcademyStep() — Called at every simulation step before
+  Agent.AgentAction() (and after the agents collect their observations).
+* Academy.AcademyReset() — Called when the Academy starts or restarts the
+  simulation (including the first time).
-The 3D Balance Ball environment does not use these functions — each agent 
-resets itself when needed — but many environments do use these functions to 
-control the environment around the agents.
+The 3D Balance Ball environment does not use these functions — each agent resets
+itself when needed — but many environments do use these functions to control the
+environment around the agents.
-The Ball3DBrain GameObject in the scene, which contains a Brain component, 
-is a child of the Academy object. (All Brain objects in a scene must be 
-children of the Academy.) All the agents in the 3D Balance Ball environment 
-use the same Brain instance. 
-A Brain doesn't store any information about an agent, 
-it just routes the agent's collected observations to the decision making 
-process and returns the chosen action to the agent. Thus, all agents can share 
-the same brain, but act independently. The Brain settings tell you quite a bit 
-about how an agent works.
+The Ball3DBrain GameObject in the scene, which contains a Brain component, is a
+child of the Academy object. (All Brain objects in a scene must be children of
+the Academy.) All the agents in the 3D Balance Ball environment use the same
+Brain instance. A Brain doesn't store any information about an agent, it just
+routes the agent's collected observations to the decision making process and
+returns the chosen action to the agent. Thus, all agents can share the same
+brain, but act independently. The Brain settings tell you quite a bit about how
+an agent works.
-The **Brain Type** determines how an agent makes its decisions. The 
-**External** and **Internal** types work together — use **External** when 
-training your agents; use **Internal** when using the trained model. 
-The **Heuristic** brain allows you to hand-code the agent's logic by extending 
-the Decision class. Finally, the **Player** brain lets you map keyboard 
-commands to actions, which can be useful when testing your agents and 
-environment. If none of these types of brains do what you need, you can 
-implement your own CoreBrain to create your own type.
+The **Brain Type** determines how an agent makes its decisions. The **External**
+and **Internal** types work together — use **External** when training your
+agents; use **Internal** when using the trained model. The **Heuristic** brain
+allows you to hand-code the agent's logic by extending the Decision class.
+Finally, the **Player** brain lets you map keyboard commands to actions, which
+can be useful when testing your agents and environment. If none of these types
+of brains do what you need, you can implement your own CoreBrain to create your
+own type.
-In this tutorial, you will set the **Brain Type** to **External** for training; 
+In this tutorial, you will set the **Brain Type** to **External** for training;
-**Vector Observation Space**
+#### Vector Observation Space
-Before making a decision, an agent collects its observation about its state 
-in the world. The vector observation is a vector of floating point numbers
-which contain relevant information for the agent to make decisions. 
+Before making a decision, an agent collects its observation about its state in
+the world. The vector observation is a vector of floating point numbers which
+contain relevant information for the agent to make decisions.
-The Brain instance used in the 3D Balance Ball example uses the **Continuous** 
-vector observation space with a **State Size** of 8. This means that the 
-feature vector containing the agent's observations contains eight elements: 
-the `x` and `z` components of the platform's rotation and the `x`, `y`, and `z`
-components of the ball's relative position and velocity. (The observation
-values are defined in the agent's `CollectObservations()` function.)
+The Brain instance used in the 3D Balance Ball example uses the **Continuous**
+vector observation space with a **State Size** of 8. This means that the feature
+vector containing the agent's observations contains eight elements: the `x` and
+`z` components of the platform's rotation and the `x`, `y`, and `z` components
+of the ball's relative position and velocity. (The observation values are
+defined in the agent's `CollectObservations()` function.)
-**Vector Action Space**
+#### Vector Action Space
-An agent is given instructions from the brain in the form of *actions*. 
-ML-Agents toolkit classifies actions into two types: the **Continuous** 
-vector action space is a vector of numbers that can vary continuously. What 
-each element of the vector means is defined by the agent logic (the PPO
-training process just learns what values are better given particular state 
-observations based on the rewards received when it tries different values). 
-For example, an element might represent a force or torque applied to a 
-`RigidBody` in the agent. The **Discrete** action vector space defines its
-actions as tables. An action given to the agent is an array of indeces into 
-tables. 
+An agent is given instructions from the brain in the form of *actions*.
+ML-Agents toolkit classifies actions into two types: the **Continuous** vector
+action space is a vector of numbers that can vary continuously. What each
+element of the vector means is defined by the agent logic (the PPO training
+process just learns what values are better given particular state observations
+based on the rewards received when it tries different values). For example, an
+element might represent a force or torque applied to a `RigidBody` in the agent.
+The **Discrete** action vector space defines its actions as tables. An action
+given to the agent is an array of indeces into tables. 
-space. 
-You can try training with both settings to observe whether there is a 
-difference. (Set the `Vector Action Space Size` to 4 when using the discrete 
+space. You can try training with both settings to observe whether there is a
+difference. (Set the `Vector Action Space Size` to 4 when using the discrete
- 
+
-The Agent is the actor that observes and takes actions in the environment. 
-In the 3D Balance Ball environment, the Agent components are placed on the 
-twelve Platform GameObjects. The base Agent object has a few properties that 
-affect its behavior:
+The Agent is the actor that observes and takes actions in the environment. In
+the 3D Balance Ball environment, the Agent components are placed on the twelve
+Platform GameObjects. The base Agent object has a few properties that affect its
+behavior:
-* **Brain** — Every agent must have a Brain. The brain determines how an agent 
-makes decisions. All the agents in the 3D Balance Ball scene share the same 
-brain.
+* **Brain** — Every agent must have a Brain. The brain determines how an agent
+  makes decisions. All the agents in the 3D Balance Ball scene share the same
+  brain.
-observe its environment. 3D Balance Ball does not use camera observations.
-* **Max Step** — Defines how many simulation steps can occur before the agent 
-decides it is done. In 3D Balance Ball, an agent restarts after 5000 steps.
+  observe its environment. 3D Balance Ball does not use camera observations.
+* **Max Step** — Defines how many simulation steps can occur before the agent
+  decides it is done. In 3D Balance Ball, an agent restarts after 5000 steps.
-3D Balance Ball sets this true so that the agent restarts after reaching the 
-**Max Step** count or after dropping the ball.
+  3D Balance Ball sets this true so that the agent restarts after reaching the
+  **Max Step** count or after dropping the ball.
-Perhaps the more interesting aspect of an agent is the Agent subclass 
-implementation. When you create an agent, you must extend the base Agent class. 
+Perhaps the more interesting aspect of an agent is the Agent subclass
+implementation. When you create an agent, you must extend the base Agent class.
-* Agent.AgentReset() — Called when the Agent resets, including at the beginning 
-of a session. The Ball3DAgent class uses the reset function to reset the 
-platform and ball. The function randomizes the reset values so that the 
-training generalizes to more than a specific starting position and platform 
-attitude.
-* Agent.CollectObservations() — Called every simulation step. Responsible for 
-collecting the agent's observations of the environment. Since the Brain 
-instance assigned to the agent is set to the continuous vector observation
-space with a state size of 8, the `CollectObservations()` must call 
-`AddVectorObs` 8 times.
+* Agent.AgentReset() — Called when the Agent resets, including at the beginning
+  of a session. The Ball3DAgent class uses the reset function to reset the
+  platform and ball. The function randomizes the reset values so that the
+  training generalizes to more than a specific starting position and platform
+  attitude.
+* Agent.CollectObservations() — Called every simulation step. Responsible for
+  collecting the agent's observations of the environment. Since the Brain
+  instance assigned to the agent is set to the continuous vector observation
+  space with a state size of 8, the `CollectObservations()` must call
+  `AddVectorObs` 8 times.
-by the brain. The Ball3DAgent example handles both the continuous and the 
-discrete action space types. There isn't actually much difference between the 
-two state types in this environment — both vector action spaces result in a
-small change in platform rotation at each step. The `AgentAction()` function
-assigns a reward to the agent; in this example, an agent receives a small 
-positive reward for each step it keeps the ball on the platform and a larger, 
-negative reward for dropping the ball. An agent is also marked as done when it
-drops the ball so that it will reset with a new ball for the next simulation
-step.
+  by the brain. The Ball3DAgent example handles both the continuous and the
+  discrete action space types. There isn't actually much difference between the
+  two state types in this environment — both vector action spaces result in a
+  small change in platform rotation at each step. The `AgentAction()` function
+  assigns a reward to the agent; in this example, an agent receives a small
+  positive reward for each step it keeps the ball on the platform and a larger,
+  negative reward for dropping the ball. An agent is also marked as done when it
+  drops the ball so that it will reset with a new ball for the next simulation
+  step.

 ## Training the Brain with Reinforcement Learning


-In order to train an agent to correctly balance the ball, we will use a 
-Reinforcement Learning algorithm called Proximal Policy Optimization (PPO). 
-This is a method that has been shown to be safe, efficient, and more general 
-purpose than many other RL algorithms, as such we have chosen it as the 
-example algorithm for use with ML-Agents toolkit. For more information on PPO, 
-OpenAI has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/) 
+In order to train an agent to correctly balance the ball, we will use a
+Reinforcement Learning algorithm called Proximal Policy Optimization (PPO). This
+is a method that has been shown to be safe, efficient, and more general purpose
+than many other RL algorithms, as such we have chosen it as the example
+algorithm for use with ML-Agents toolkit. For more information on PPO, OpenAI
+has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/)
+To train the agents within the Ball Balance environment, we will be using the
+python package. We have provided a convenient Python wrapper script called
+`learn.py` which accepts arguments used to configure both training and inference
+phases.
-To train the agents within the Ball Balance environment, we will be using the python 
-package. We have provided a convenient Python wrapper script called `learn.py` which accepts arguments used to configure both training and inference phases.
-
-We can use `run_id` to identify the experiment and create a folder where the model and summary statistics are stored. When using TensorBoard to observe the training statistics, it helps to set this to a sequential value 
-for each training run. In other words, "BalanceBall1" for the first run, 
-"BalanceBall2" or the second, and so on. If you don't, the summaries for 
-every training run are saved to the same directory and will all be included 
-on the same graph.
+We can use `run_id` to identify the experiment and create a folder where the
+model and summary statistics are stored. When using TensorBoard to observe the
+training statistics, it helps to set this to a sequential value for each
+training run. In other words, "BalanceBall1" for the first run, "BalanceBall2"
+or the second, and so on. If you don't, the summaries for every training run are
+saved to the same directory and will all be included on the same graph.
-To summarize, go to your command line, enter the `ml-agents/python` directory and type: 
+To summarize, go to your command line, enter the `ml-agents` directory and type:
-```
-python3 learn.py --run-id=<run-identifier> --train 
+```shell
+learn trainer_config.yaml --run-id=<run-identifier> --train
-When the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen, you can press the :arrow_forward: button in Unity to start training in the Editor.
-
-**Note**: If you're using Anaconda, don't forget to activate the ml-agents environment first.
+When the message _"Start training by pressing the Play button in the Unity
+Editor"_ is displayed on the screen, you can press the :arrow_forward: button in
+Unity to start training in the Editor.
-The `--train` flag tells the ML-Agents toolkit to run in training mode. 
+**Note**: If you're using Anaconda, don't forget to activate the ml-agents
+environment first.
-**Note**: You can train using an executable rather than the Editor. To do so, follow the intructions in 
-[Using an Execuatble](Learning-Environment-Executable.md).
+The `--train` flag tells the ML-Agents toolkit to run in training mode.
+**Note**: You can train using an executable rather than the Editor. To do so,
+follow the intructions in [Using an
+Execuatble](Learning-Environment-Executable.md).
-Once you start training using `learn.py` in the way described in the previous section, the `ml-agents/python` folder will 
-contain a `summaries` directory. In order to observe the training process 
-in more detail, you can use TensorBoard. From the command line navigate to `ml-agents/python` folder and run:
+Once you start training using `learn.py` in the way described in the previous
+section, the `ml-agents` directory will contain a `summaries` directory. In
+order to observe the training process in more detail, you can use TensorBoard.
+From the command line run:
-`tensorboard --logdir=summaries`
+```shell
+tensorboard --logdir=summaries
+```
-* Lesson - only interesting when performing
-[curriculum training](Training-Curriculum-Learning.md). 
-This is not used in the 3D Balance Ball environment. 
-* Cumulative Reward - The mean cumulative episode reward over all agents. 
-Should increase during a successful training session.
-* Entropy - How random the decisions of the model are. Should slowly decrease 
-during a successful training process. If it decreases too quickly, the `beta` 
-hyperparameter should be increased.
-* Episode Length - The mean length of each episode in the environment for all 
-agents.
-* Learning Rate - How large a step the training algorithm takes as it searches 
-for the optimal policy. Should decrease over time.
+* Lesson - only interesting when performing [curriculum
+  training](Training-Curriculum-Learning.md). This is not used in the 3D Balance
+  Ball environment.
+* Cumulative Reward - The mean cumulative episode reward over all agents. Should
+  increase during a successful training session.
+* Entropy - How random the decisions of the model are. Should slowly decrease
+  during a successful training process. If it decreases too quickly, the `beta`
+  hyperparameter should be increased.
+* Episode Length - The mean length of each episode in the environment for all
+  agents.
+* Learning Rate - How large a step the training algorithm takes as it searches
+  for the optimal policy. Should decrease over time.
-much the policy (process for deciding actions) is changing. The magnitude of 
-this should decrease during a successful training session.
-* Value Estimate - The mean value estimate for all states visited by the agent. 
-Should increase during a successful training session.
+  much the policy (process for deciding actions) is changing. The magnitude of
+  this should decrease during a successful training session.
+* Value Estimate - The mean value estimate for all states visited by the agent.
+  Should increase during a successful training session.
-well the model is able to predict the value of each state. This should decrease
-during a successful training session.
+  well the model is able to predict the value of each state. This should
+  decrease during a successful training session.
-Once the training process completes, and the training process saves the model 
-(denoted by the `Saved Model` message) you can add it to the Unity project and 
-use it with agents having an **Internal** brain type.
-**Note:** Do not just close the Unity Window once the `Saved Model` message appears. Either wait for the training process to close the window or press Ctrl+C at the command-line prompt. If you simply close the window manually, the .bytes file containing the trained model is not exported into the ml-agents folder.
+Once the training process completes, and the training process saves the model
+(denoted by the `Saved Model` message) you can add it to the Unity project and
+use it with agents having an **Internal** brain type. **Note:** Do not just
+close the Unity Window once the `Saved Model` message appears. Either wait for
+the training process to close the window or press Ctrl+C at the command-line
+prompt. If you simply close the window manually, the .bytes file containing the
+trained model is not exported into the ml-agents folder.
-Because TensorFlowSharp support is still experimental, it is disabled by 
-default. In order to enable it, you must follow these steps. Please note that 
+Because TensorFlowSharp support is still experimental, it is disabled by
+default. In order to enable it, you must follow these steps. Please note that
-To set up the TensorFlowSharp Support, follow [Setting up ML-Agents Toolkit within Unity](Basic-Guide.md#setting-up-ml-agents-within-unity) section.
-of the Basic Guide page.
+To set up the TensorFlowSharp Support, follow [Setting up ML-Agents Toolkit
+within Unity](Basic-Guide.md#setting-up-ml-agents-within-unity) section. of the
+Basic Guide page.
-To embed the trained model into Unity, follow the later part of [Training the Brain with Reinforcement Learning](Basic-Guide.md#training-the-brain-with-reinforcement-learning) section of the Basic Buides page. 
+To embed the trained model into Unity, follow the later part of [Training the
+Brain with Reinforcement
+Learning](Basic-Guide.md#training-the-brain-with-reinforcement-learning) section
+of the Basic Buides page.
--- a/docs/Glossary.md
+++ b/docs/Glossary.md
 # ML-Agents Toolkit Glossary

- * **Academy** - Unity Component which controls timing, reset, and 
- training/inference settings of the environment. 
- * **Action** - The carrying-out of a decision on the part of an 
- agent within the environment.
- * **Agent** - Unity Component which produces observations and 
- takes actions in the environment. Agents actions are determined 
- by decisions produced by a linked Brain.
- * **Brain** - Unity Component which makes decisions for the agents 
- linked to it.
- * **Decision** - The specification produced by a Brain for an action 
- to be carried out given an observation. 
- * **Editor** - The Unity Editor, which may include any pane 
- (e.g. Hierarchy, Scene, Inspector). 
- * **Environment** - The Unity scene which contains Agents, Academy, 
- and Brains.
- * **FixedUpdate** - Unity method called each time the the game engine 
- is stepped. ML-Agents logic should be placed here.
- * **Frame** - An instance of rendering the main camera for the 
- display. Corresponds to each `Update` call of the game engine.
- * **Observation** - Partial information describing the state of the 
- environment available to a given agent. (e.g. Vector, Visual, Text)
- * **Policy** - Function for producing decisions from observations.
- * **Reward** - Signal provided at every step used to indicate 
- desirability of an agent’s action within the current state 
- of the environment.
- * **State** - The underlying properties of the environment 
- (including all agents within it) at a given time.
- * **Step** - Corresponds to each `FixedUpdate` call of the game engine. 
- Is the smallest atomic change to the state possible.
- * **Update** - Unity function called each time a frame is rendered. 
- ML-Agents logic should not be placed here.
- * **External Coordinator** - ML-Agents class responsible for 
- communication with outside processes (in this case, the Python API).
- * **Trainer** - Python class which is responsible for training a given 
- external brain. Contains TensorFlow graph which makes decisions 
- for external brain.
+* **Academy** - Unity Component which controls timing, reset, and
+  training/inference settings of the environment.
+* **Action** - The carrying-out of a decision on the part of an agent within the
+  environment.
+* **Agent** - Unity Component which produces observations and takes actions in
+  the environment. Agents actions are determined by decisions produced by a
+  linked Brain.
+* **Brain** - Unity Component which makes decisions for the agents linked to it.
+* **Decision** - The specification produced by a Brain for an action to be
+  carried out given an observation.
+* **Editor** - The Unity Editor, which may include any pane (e.g. Hierarchy,
+  Scene, Inspector).
+* **Environment** - The Unity scene which contains Agents, Academy, and Brains.
+* **FixedUpdate** - Unity method called each time the the game engine is
+  stepped. ML-Agents logic should be placed here.
+* **Frame** - An instance of rendering the main camera for the display.
+  Corresponds to each `Update` call of the game engine.
+* **Observation** - Partial information describing the state of the environment
+  available to a given agent. (e.g. Vector, Visual, Text)
+* **Policy** - Function for producing decisions from observations.
+* **Reward** - Signal provided at every step used to indicate desirability of an
+  agent’s action within the current state of the environment.
+* **State** - The underlying properties of the environment (including all agents
+  within it) at a given time.
+* **Step** - Corresponds to each `FixedUpdate` call of the game engine. Is the
+  smallest atomic change to the state possible.
+* **Update** - Unity function called each time a frame is rendered. ML-Agents
+  logic should not be placed here.
+* **External Coordinator** - ML-Agents class responsible for communication with
+  outside processes (in this case, the Python API).
+* **Trainer** - Python class which is responsible for training a given external
+  brain. Contains TensorFlow graph which makes decisions for external brain.
--- a/docs/Installation.md
+++ b/docs/Installation.md
 _Linux Build Support_ component when installing Unity.

 <p align="center">
-    <img src="images/unity_linux_build_support.png" 
-        alt="Linux Build Support" 
+    <img src="images/unity_linux_build_support.png"
+        alt="Linux Build Support"
-Once installed, you will want to clone the ML-Agents Toolkit GitHub repository. 
+Once installed, you will want to clone the ML-Agents Toolkit GitHub repository.
-The `unity-environment` directory in this repository contains the Unity Assets
+The `MLAgentsSDK` directory in this repository contains the Unity Assets
-Both directories are located at the root of the repository. 
+Both directories are located at the root of the repository.
-the dependencies listed in the [requirements file](../python/requirements.txt).
+the dependencies listed in the [requirements file](../requirements.txt).
- [TensorFlow](Background-TensorFlow.md) 
- [Jupyter](Background-Jupyter.md) 
+
+- [TensorFlow](Background-TensorFlow.md)
+- [Jupyter](Background-Jupyter.md)
+
+### NOTES
-**NOTES**
- If you are using Anaconda and are having trouble with TensorFlow, please see the following [note](https://www.tensorflow.org/install/install_mac#installing_with_anaconda) on how to install TensorFlow in an Anaconda environment. 
+- If you are using Anaconda and are having trouble with TensorFlow, please see
+  the following
+  [note](https://www.tensorflow.org/install/install_mac#installing_with_anaconda)
+  on how to install TensorFlow in an Anaconda environment.
-If you are a Windows user who is new to Python and TensorFlow, follow [this guide](Installation-Windows.md) to set up your Python environment.
+If you are a Windows user who is new to Python and TensorFlow, follow [this
+guide](Installation-Windows.md) to set up your Python environment.
-[Download](https://www.python.org/downloads/) and install Python 3 if you do not already have it.
+[Download](https://www.python.org/downloads/) and install Python 3 if you do not
+already have it.
-If your Python environment doesn't include `pip`, see these 
+If your Python environment doesn't include `pip`, see these
-To install dependencies, **go into the `python` subdirectory** of the repository,
-and run from the command line:
+To install dependencies, **go into the `python` subdirectory** of the
+repository, and run from the command line:
-If you'd like to use Docker for ML-Agents, please follow 
-[this guide](Using-Docker.md). 
+If you'd like to use Docker for ML-Agents, please follow
+[this guide](Using-Docker.md).
-The [Basic Guide](Basic-Guide.md) page contains several short 
-tutorials on setting up the ML-Agents toolkit within Unity, running a pre-trained model, in
+The [Basic Guide](Basic-Guide.md) page contains several short tutorials on
+setting up the ML-Agents toolkit within Unity, running a pre-trained model, in
-If you run into any problems regarding ML-Agents, refer to our [FAQ](FAQ.md) and our [Limitations](Limitations.md) pages. If you can't find anything please
+If you run into any problems regarding ML-Agents, refer to our [FAQ](FAQ.md) and
+our [Limitations](Limitations.md) pages. If you can't find anything please
-make sure to cite relevant information on OS, Python version, and exact error 
-message (whenever possible). 
+make sure to cite relevant information on OS, Python version, and exact error
+message (whenever possible).
--- a/docs/Learning-Environment-Create-New.md
+++ b/docs/Learning-Environment-Create-New.md

 2. In a file system window, navigate to the folder containing your cloned ML-Agents repository. 

-3. Drag the `ML-Agents` folder from `unity-environments/Assets` to the Unity Editor Project window.
+3. Drag the `ML-Agents` folder from `MLAgentsSDK/Assets` to the Unity Editor Project window.

 Your Unity **Project** window should contain the following assets:


 Press **Play** to run the scene and use the WASD keys to move the agent around the platform. Make sure that there are no errors displayed in the Unity editor Console window and that the agent resets when it reaches its target or falls from the platform. Note that for more involved debugging, the ML-Agents SDK includes a convenient Monitor class that you can use to easily display agent status information in the Game window.

-One additional test you can perform is to first ensure that your environment 
-and the Python API work as expected using the `python/Basics` 
-[Jupyter notebook](Background-Jupyter.md). Within `Basics`, be sure to set 
-`env_name` to the name of the environment file you specify when building
-this environment.
+One additional test you can perform is to first ensure that your environment and
+the Python API work as expected using the `notebooks/getting-started.ipynb`
+[Jupyter notebook](Background-Jupyter.md). Within the notebook, be sure to set
+`env_name` to the name of the environment file you specify when building this
+environment.

 Now you can train the Agent. To get ready for training, you must first to change the **Brain Type** from **Player** to **External**. From there, the process is the same as described in [Training ML-Agents](Training-ML-Agents.md). 

--- a/docs/Learning-Environment-Examples.md
+++ b/docs/Learning-Environment-Examples.md

 The Unity ML-Agents toolkit contains an expanding set of example environments which
 demonstrate various features of the platform. Environments are located in 
-`unity-environment/Assets/ML-Agents/Examples` and summarized below. 
+`MLAgentsSDK/Assets/ML-Agents/Examples` and summarized below. 
 Additionally, our 
 [first ML Challenge](https://connect.unity.com/challenges/ml-agents-1)
 contains environments created by the community.
--- a/docs/Learning-Environment-Executable.md
+++ b/docs/Learning-Environment-Executable.md

 1. Launch Unity.
 2. On the Projects dialog, choose the **Open** option at the top of the window.
-3. Using the file dialog that opens, locate the `unity-environment` folder 
+3. Using the file dialog that opens, locate the `MLAgentsSDK` folder 
 within the ML-Agents project and click **Open**.
 4. In the **Project** window, navigate to the folder 
 `Assets/ML-Agents/Examples/3DBall/`.

 ![Training running](images/training-running.png)

-You can press Ctrl+C to stop the training, and your trained model will be at `ml-agents/python/models/<run-identifier>/<env_name>_<run-identifier>.bytes`, which corresponds to your model's latest checkpoint. You can now embed this trained model into your internal brain by following the steps below:
+You can press Ctrl+C to stop the training, and your trained model will be at `models/<run-identifier>/<env_name>_<run-identifier>.bytes`, which corresponds to your model's latest checkpoint. You can now embed this trained model into your internal brain by following the steps below:
-`unity-environment/Assets/ML-Agents/Examples/3DBall/TFModels/`.
+`MLAgentsSDK/Assets/ML-Agents/Examples/3DBall/TFModels/`.
 2. Open the Unity Editor, and select the **3DBall** scene as described above.
 3. Select the **Ball3DBrain** object from the Scene hierarchy.
 4. Change the **Type of Brain** to **Internal**.
--- a/docs/ML-Agents-Overview.md
+++ b/docs/ML-Agents-Overview.md
 # ML-Agents Toolkit Overview

-**The Unity Machine Learning Agents Toolkit** (ML-Agents Toolkit) is an open-source Unity plugin 
-that enables games and simulations to serve as environments for training
-intelligent agents. Agents can be trained using reinforcement learning,
-imitation learning, neuroevolution, or other machine learning methods through
-a simple-to-use Python API. We also provide implementations (based on
-TensorFlow) of state-of-the-art algorithms to enable game developers
-and hobbyists to easily train intelligent agents for 2D, 3D and VR/AR games.
-These trained agents can be used for multiple purposes, including
+**The Unity Machine Learning Agents Toolkit** (ML-Agents Toolkit) is an
+open-source Unity plugin that enables games and simulations to serve as
+environments for training intelligent agents. Agents can be trained using
+reinforcement learning, imitation learning, neuroevolution, or other machine
+learning methods through a simple-to-use Python API. We also provide
+implementations (based on TensorFlow) of state-of-the-art algorithms to enable
+game developers and hobbyists to easily train intelligent agents for 2D, 3D and
+VR/AR games. These trained agents can be used for multiple purposes, including
-design decisions pre-release. The ML-Agents toolkit is mutually beneficial for both game
-developers and AI researchers as it provides a central platform where advances
-in AI can be evaluated on Unity’s rich environments and then made accessible
-to the wider research and game developer communities. 
+design decisions pre-release. The ML-Agents toolkit is mutually beneficial for
+both game developers and AI researchers as it provides a central platform where
+advances in AI can be evaluated on Unity’s rich environments and then made
+accessible to the wider research and game developer communities.
-Depending on your background (i.e. researcher, game developer, hobbyist),
-you may have very different questions on your mind at the moment.
-To make your transition to the ML-Agents toolkit easier, we provide several background
-pages that include overviews and helpful resources on the 
-[Unity Engine](Background-Unity.md), 
-[machine learning](Background-Machine-Learning.md) and 
-[TensorFlow](Background-TensorFlow.md). We **strongly** recommend browsing
-the relevant background pages if you're not familiar with a Unity scene, 
-basic machine learning concepts or have not previously heard of TensorFlow.
+Depending on your background (i.e. researcher, game developer, hobbyist), you
+may have very different questions on your mind at the moment. To make your
+transition to the ML-Agents toolkit easier, we provide several background pages
+that include overviews and helpful resources on the [Unity
+Engine](Background-Unity.md), [machine learning](Background-Machine-Learning.md)
+and [TensorFlow](Background-TensorFlow.md). We **strongly** recommend browsing
+the relevant background pages if you're not familiar with a Unity scene, basic
+machine learning concepts or have not previously heard of TensorFlow.
-components, different training modes and scenarios. By the end of it, you
-should have a good sense of _what_ the ML-Agents toolkit allows you to do. The subsequent
-documentation pages provide examples of _how_ to use ML-Agents.
+components, different training modes and scenarios. By the end of it, you should
+have a good sense of _what_ the ML-Agents toolkit allows you to do. The
+subsequent documentation pages provide examples of _how_ to use ML-Agents.
-hypothetical, running example throughout. We will explore the
-problem of training the behavior of a non-playable character (NPC) in a game.
-(An NPC is a game character that is never controlled by a human player and
-its behavior is pre-defined by the game developer.) More specifically, let's
-assume we're building a multi-player, war-themed game in which players control 
-the soldiers. In this game, we have a single NPC who serves as a medic, finding 
-and reviving wounded players. Lastly, let us assume that there
-are two teams, each with five players and one NPC medic.
+hypothetical, running example throughout. We will explore the problem of
+training the behavior of a non-playable character (NPC) in a game. (An NPC is a
+game character that is never controlled by a human player and its behavior is
+pre-defined by the game developer.) More specifically, let's assume we're
+building a multi-player, war-themed game in which players control the soldiers.
+In this game, we have a single NPC who serves as a medic, finding and reviving
+wounded players. Lastly, let us assume that there are two teams, each with five
+players and one NPC medic.
-location. Second, it needs to be aware of which of its team members are 
-injured and require assistance. In the case of multiple injuries, it needs to
-assess the degree of injury and decide who to help first. Lastly, a good
-medic will always place itself in a position where it can quickly help its 
-team members. Factoring in all of these traits means that at every instance, 
-the medic needs to measure several attributes of the environment (e.g. 
-position of team members, position of enemies, which of its team members are 
-injured and to what degree) and then decide on an action (e.g. hide from enemy 
-fire, move to help one of its members). Given the large number of settings of 
-the environment and the large number of actions that the medic can take,
-defining and implementing such complex behaviors by hand is challenging and 
-prone to errors.
+location. Second, it needs to be aware of which of its team members are injured
+and require assistance. In the case of multiple injuries, it needs to assess the
+degree of injury and decide who to help first. Lastly, a good medic will always
+place itself in a position where it can quickly help its team members. Factoring
+in all of these traits means that at every instance, the medic needs to measure
+several attributes of the environment (e.g. position of team members, position
+of enemies, which of its team members are injured and to what degree) and then
+decide on an action (e.g. hide from enemy fire, move to help one of its
+members). Given the large number of settings of the environment and the large
+number of actions that the medic can take, defining and implementing such
+complex behaviors by hand is challenging and prone to errors.
-With ML-Agents, it is possible to _train_ the behaviors of such NPCs 
-(called **agents**) using a variety of methods. The basic idea is quite simple. 
-We need to define three entities at every moment of the game 
-(called **environment**):
+With ML-Agents, it is possible to _train_ the behaviors of such NPCs (called
+**agents**) using a variety of methods. The basic idea is quite simple. We need
+to define three entities at every moment of the game (called **environment**):
+
-Observations can be numeric and/or visual. Numeric observations measure 
-attributes of the environment from the point of view of the agent. For
-our medic this would be attributes of the battlefield that are visible to it.
-For most interesting environments, an agent will require 
-several continuous numeric observations. 
-Visual observations, on the other hand, are images generated from the cameras 
-attached to the agent and represent what the agent is seeing at that point
-in time. It is common to confuse an agent's observation with the environment
-(or game) **state**. The environment state represents information about the
-entire scene containing all the game characters. The agents observation, 
-however, only contains information that the agent is aware of and is typically 
-a subset of the environment state. For example, the medic observation cannot
-include information about an enemy in hiding that the medic is unaware of.
- **Actions** - what actions the medic can take. Similar
-to observations, actions can either be continuous or discrete depending
-on the complexity of the environment and agent. In the case of the medic,
-if the environment is a simple grid world where only their location matters,
-then a discrete action taking on one of four values (north, south, east, west)
-suffices. However, if the environment is more complex and the medic can move
-freely then using two continuous actions (one for direction and another
-for speed) is more appropriate.
+  Observations can be numeric and/or visual. Numeric observations measure
+  attributes of the environment from the point of view of the agent. For our
+  medic this would be attributes of the battlefield that are visible to it. For
+  most interesting environments, an agent will require several continuous
+  numeric observations. Visual observations, on the other hand, are images
+  generated from the cameras attached to the agent and represent what the agent
+  is seeing at that point in time. It is common to confuse an agent's
+  observation with the environment (or game) **state**. The environment state
+  represents information about the entire scene containing all the game
+  characters. The agents observation, however, only contains information that
+  the agent is aware of and is typically a subset of the environment state. For
+  example, the medic observation cannot include information about an enemy in
+  hiding that the medic is unaware of.
+- **Actions** - what actions the medic can take. Similar to observations,
+  actions can either be continuous or discrete depending on the complexity of
+  the environment and agent. In the case of the medic, if the environment is a
+  simple grid world where only their location matters, then a discrete action
+  taking on one of four values (north, south, east, west) suffices. However, if
+  the environment is more complex and the medic can move freely then using two
+  continuous actions (one for direction and another for speed) is more
+  appropriate.
-Note that the reward signal need not be 
-provided at every moment, but only when the medic performs an action that is 
-good or bad. For example, it can receive a large negative reward if it dies, 
-a modest positive reward whenever it revives a wounded team member, and a
-modest negative reward when a wounded team member dies due to lack of
-assistance. Note that the reward signal is how the objectives of the task
-are communicated to the agent, so they need to be set up in a manner where
-maximizing reward generates the desired optimal behavior.
+  Note that the reward signal need not be provided at every moment, but only
+  when the medic performs an action that is good or bad. For example, it can
+  receive a large negative reward if it dies, a modest positive reward whenever
+  it revives a wounded team member, and a modest negative reward when a wounded
+  team member dies due to lack of assistance. Note that the reward signal is how
+  the objectives of the task are communicated to the agent, so they need to be
+  set up in a manner where maximizing reward generates the desired optimal
+  behavior.
-After defining these three entities (the building blocks of a 
-**reinforcement learning task**), 
-we can now _train_ the medic's behavior. This is achieved by simulating the
-environment for many trials where the medic, over time, learns what is the 
-optimal action to take for every observation it measures by maximizing 
-its future reward. The key is that by learning the actions that maximize its
-reward, the medic is learning the behaviors that make it a good medic (i.e.
-one who saves the most number of lives). In **reinforcement learning**
-terminology, the behavior that is learned is called a **policy**, which is
-essentially a (optimal) mapping from observations to actions. Note that
+After defining these three entities (the building blocks of a **reinforcement
+learning task**), we can now _train_ the medic's behavior. This is achieved by
+simulating the environment for many trials where the medic, over time, learns
+what is the optimal action to take for every observation it measures by
+maximizing its future reward. The key is that by learning the actions that
+maximize its reward, the medic is learning the behaviors that make it a good
+medic (i.e. one who saves the most number of lives). In **reinforcement
+learning** terminology, the behavior that is learned is called a **policy**,
+which is essentially a (optimal) mapping from observations to actions. Note that
-**training phase**, while playing the game with an NPC that is using its 
-learned policy is called the **inference phase**.
+**training phase**, while playing the game with an NPC that is using its learned
+policy is called the **inference phase**.
-The ML-Agents toolkit provides all the necessary tools for using Unity as the simulation 
-engine for learning the policies of different objects in a Unity environment.
-In the next few sections, we discuss how the ML-Agents toolkit achieves this and what
-features it provides.
+The ML-Agents toolkit provides all the necessary tools for using Unity as the
+simulation engine for learning the policies of different objects in a Unity
+environment. In the next few sections, we discuss how the ML-Agents toolkit
+achieves this and what features it provides.
-The ML-Agents toolkit is a Unity plugin that contains three high-level components: 
-* **Learning Environment** - which contains the Unity scene and all the game
-characters. 
-* **Python API** - which contains all the machine learning algorithms that are 
-used for training (learning a behavior or policy). Note that, unlike 
-the Learning Environment, the Python API is not part of Unity, but lives
-outside and communicates with Unity through the External Communicator.
-* **External Communicator** - which connects the Learning Environment
-with the Python API. It lives within the Learning Environment.
+The ML-Agents toolkit is a Unity plugin that contains three high-level
+components:
+
+- **Learning Environment** - which contains the Unity scene and all the game
+  characters.
+- **Python API** - which contains all the machine learning algorithms that are
+  used for training (learning a behavior or policy). Note that, unlike the
+  Learning Environment, the Python API is not part of Unity, but lives outside
+  and communicates with Unity through the External Communicator.
+- **External Communicator** - which connects the Learning Environment with the
+  Python API. It lives within the Learning Environment.

 <p align="center">
    <img src="images/learning_environment_basic.png" 
 _Simplified block diagram of ML-Agents._

 The Learning Environment contains three additional components that help
-organize the Unity scene: 
-* **Agents** - which is attached to a Unity GameObject (any character within a 
-scene) and handles generating its observations, performing the actions it 
-receives and assigning a reward (positive / negative) when appropriate. 
-Each Agent is linked to exactly one Brain.
-* **Brains** - which encapsulates the logic for making decisions for the Agent.
-In essence, the Brain is what holds on to the policy for each Agent and
-determines which actions the Agent should take at each instance. More
-specifically, it is the component that receives the observations and rewards
-from the Agent and returns an action. 
-* **Academy** - which orchestrates the observation and decision making process.
-Within the Academy, several environment-wide parameters such as the rendering
-quality and the speed at which the environment is run can be specified. The
-External Communicator lives within the Academy.
+organize the Unity scene:
-Every Learning Environment will always have one global Academy and one Agent
-for every character in the scene. While each Agent must be linked to a Brain,
-it is possible for Agents that have similar observations and actions to be 
-linked to the same Brain. In our sample game, we have two teams each with
-their own medic. Thus we will have two Agents in our Learning Environment, 
-one for each medic, but both of these medics can be linked to the same Brain.
-Note that these two medics are linked to the same Brain because their _space_
-of observations and actions are similar. This does not mean that at each
-instance they will have identical observation and action _values_. In other
-words, the Brain defines the space of all possible observations and actions, 
-while the Agents connected to it (in this case the medics) can each have
-their own, unique observation and action values. If we expanded our game 
-to include tank driver NPCs, then the Agent attached to those characters 
-cannot share a Brain with the Agent linked to the medics (medics and drivers 
-have different actions).
+- **Agents** - which is attached to a Unity GameObject (any character within a
+  scene) and handles generating its observations, performing the actions it
+  receives and assigning a reward (positive / negative) when appropriate. Each
+  Agent is linked to exactly one Brain.
+- **Brains** - which encapsulates the logic for making decisions for the Agent.
+  In essence, the Brain is what holds on to the policy for each Agent and
+  determines which actions the Agent should take at each instance. More
+  specifically, it is the component that receives the observations and rewards
+  from the Agent and returns an action.
+- **Academy** - which orchestrates the observation and decision making process.
+  Within the Academy, several environment-wide parameters such as the rendering
+  quality and the speed at which the environment is run can be specified. The
+  External Communicator lives within the Academy.
+
+Every Learning Environment will always have one global Academy and one Agent for
+every character in the scene. While each Agent must be linked to a Brain, it is
+possible for Agents that have similar observations and actions to be linked to
+the same Brain. In our sample game, we have two teams each with their own medic.
+Thus we will have two Agents in our Learning Environment, one for each medic,
+but both of these medics can be linked to the same Brain. Note that these two
+medics are linked to the same Brain because their _space_ of observations and
+actions are similar. This does not mean that at each instance they will have
+identical observation and action _values_. In other words, the Brain defines the
+space of all possible observations and actions, while the Agents connected to it
+(in this case the medics) can each have their own, unique observation and action
+values. If we expanded our game to include tank driver NPCs, then the Agent
+attached to those characters cannot share a Brain with the Agent linked to the
+medics (medics and drivers have different actions).
-    <img src="images/learning_environment_example.png" 
-        alt="Example ML-Agents Scene Block Diagram" 
+    <img src="images/learning_environment_example.png"
+        alt="Example ML-Agents Scene Block Diagram"
-We have yet to discuss how the ML-Agents toolkit trains behaviors, and what role the
-Python API and External Communicator play. Before we dive into those details,
-let's summarize the earlier components. Each character is attached to an Agent,
-and each Agent is linked to a Brain. The Brain receives observations and
-rewards from the Agent and returns actions. The Academy ensures that all the
+We have yet to discuss how the ML-Agents toolkit trains behaviors, and what role
+the Python API and External Communicator play. Before we dive into those
+details, let's summarize the earlier components. Each character is attached to
+an Agent, and each Agent is linked to a Brain. The Brain receives observations
+and rewards from the Agent and returns actions. The Academy ensures that all the
-* **External** - where decisions are made using the Python API. Here, the
-observations and rewards collected by the Brain are forwarded to the Python
-API through the External Communicator. The Python API then returns the
-corresponding action that needs to be taken by the Agent.
-* **Internal** - where decisions are made using an embedded 
-[TensorFlow](Background-TensorFlow.md) model. 
-The embedded TensorFlow model represents a learned policy and the Brain 
-directly uses this model to determine the action for each Agent.
-* **Player** - where decisions are made using real input from a keyboard or 
-controller. Here, a human player is controlling the Agent and the observations
-and rewards collected by the Brain are not used to control the Agent.
-* **Heuristic** - where decisions are made using hard-coded behavior. This
-resembles how most character behaviors are currently defined and can be
-helpful for debugging or comparing how an Agent with hard-coded rules compares
-to an Agent whose behavior has been trained. In our example, once we have
-trained a Brain for the medics we could assign a medic on one team to the
-trained Brain and assign the medic on the other team a Heuristic Brain
-with hard-coded behaviors. We can then evaluate which medic is more effective.
+
+- **External** - where decisions are made using the Python API. Here, the
+  observations and rewards collected by the Brain are forwarded to the Python
+  API through the External Communicator. The Python API then returns the
+  corresponding action that needs to be taken by the Agent.
+- **Internal** - where decisions are made using an embedded
+  [TensorFlow](Background-TensorFlow.md) model. The embedded TensorFlow model
+  represents a learned policy and the Brain directly uses this model to
+  determine the action for each Agent.
+- **Player** - where decisions are made using real input from a keyboard or
+  controller. Here, a human player is controlling the Agent and the observations
+  and rewards collected by the Brain are not used to control the Agent.
+- **Heuristic** - where decisions are made using hard-coded behavior. This
+  resembles how most character behaviors are currently defined and can be
+  helpful for debugging or comparing how an Agent with hard-coded rules compares
+  to an Agent whose behavior has been trained. In our example, once we have
+  trained a Brain for the medics we could assign a medic on one team to the
+  trained Brain and assign the medic on the other team a Heuristic Brain with
+  hard-coded behaviors. We can then evaluate which medic is more effective.
-As currently described, it may seem that the External Communicator
-and Python API are only leveraged by the External Brain. This is not true.
-It is possible to configure the Internal, Player and Heuristic Brains to
-also send the observations, rewards and actions to the Python API through
-the External Communicator (a feature called _broadcasting_). As we will see 
-shortly, this enables additional training modes.
+As currently described, it may seem that the External Communicator and Python
+API are only leveraged by the External Brain. This is not true. It is possible
+to configure the Internal, Player and Heuristic Brains to also send the
+observations, rewards and actions to the Python API through the External
+Communicator (a feature called _broadcasting_). As we will see shortly, this
+enables additional training modes.
-    <img src="images/learning_environment.png" 
-        alt="ML-Agents Scene Block Diagram" 
+    <img src="images/learning_environment.png"
+        alt="ML-Agents Scene Block Diagram"
        border="10" />
 </p>


 ### Built-in Training and Inference

-As mentioned previously, the ML-Agents toolkit ships with several implementations of 
-state-of-the-art algorithms for training intelligent agents. In this mode, the 
-Brain type is set to External during training and Internal during inference. 
-More specifically, during training, all the medics in the scene send their
-observations to the Python API through the External Communicator (this is the 
-behavior with an External Brain). The Python API processes these observations
-and sends back actions for each medic to take. During training these actions 
-are mostly exploratory to help the Python API learn the best policy for each 
-medic. Once training concludes, the learned policy for each medic can be 
-exported. Given that all our implementations are based on TensorFlow, the 
-learned policy is just a TensorFlow model file. Then during the inference 
-phase, we switch the Brain type to Internal and include the TensorFlow model 
-generated from the training phase. Now during the inference phase, the medics 
-still  continue to generate their observations, but instead of being sent to
-the Python API, they will be fed into their (internal, embedded) model to
-generate the _optimal_ action for each medic to take at every point in time.
+As mentioned previously, the ML-Agents toolkit ships with several
+implementations of state-of-the-art algorithms for training intelligent agents.
+In this mode, the Brain type is set to External during training and Internal
+during inference. More specifically, during training, all the medics in the
+scene send their observations to the Python API through the External
+Communicator (this is the behavior with an External Brain). The Python API
+processes these observations and sends back actions for each medic to take.
+During training these actions are mostly exploratory to help the Python API
+learn the best policy for each medic. Once training concludes, the learned
+policy for each medic can be exported. Given that all our implementations are
+based on TensorFlow, the learned policy is just a TensorFlow model file. Then
+during the inference phase, we switch the Brain type to Internal and include the
+TensorFlow model generated from the training phase. Now during the inference
+phase, the medics still  continue to generate their observations, but instead of
+being sent to the Python API, they will be fed into their (internal, embedded)
+model to generate the _optimal_ action for each medic to take at every point in
+time.
-To summarize: our built-in implementations are based on TensorFlow, thus,
-during training the Python API uses the observations it receives to learn
-a TensorFlow model. This model is then embedded within the Internal Brain
-during inference to generate the optimal actions for all Agents linked to
-that Brain. **Note that our Internal Brain is currently experimental as it
-is limited to TensorFlow models and leverages the third-party 
-[TensorFlowSharp](https://github.com/migueldeicaza/TensorFlowSharp)
-library.**
+To summarize: our built-in implementations are based on TensorFlow, thus, during
+training the Python API uses the observations it receives to learn a TensorFlow
+model. This model is then embedded within the Internal Brain during inference to
+generate the optimal actions for all Agents linked to that Brain. **Note that
+our Internal Brain is currently experimental as it is limited to TensorFlow
+models and leverages the third-party
+[TensorFlowSharp](https://github.com/migueldeicaza/TensorFlowSharp) library.**

 The
 [Getting Started with the 3D Balance Ball Example](Getting-Started-with-Balance-Ball.md)

-In the previous mode, the External Brain type was used for training 
-to generate a TensorFlow model that the Internal Brain type can understand
-and use. However, any user of the ML-Agents toolkit can leverage their own algorithms
-for both training and inference. In this case, the Brain type would be set
-to External for both training and inferences phases and the behaviors of
-all the Agents in the scene will be controlled within Python.
+In the previous mode, the External Brain type was used for training to generate
+a TensorFlow model that the Internal Brain type can understand and use. However,
+any user of the ML-Agents toolkit can leverage their own algorithms for both
+training and inference. In this case, the Brain type would be set to External
+for both training and inferences phases and the behaviors of all the Agents in
+the scene will be controlled within Python.

 We do not currently have a tutorial highlighting this mode, but you can
 learn more about the Python API [here](Python-API.md).
-This mode is an extension of _Built-in Training and Inference_, and
-is particularly helpful when training intricate behaviors for complex
-environments. Curriculum learning is a way of training a machine learning
-model where more difficult aspects of a problem are gradually introduced in 
-such a way that the model is always optimally challenged. This idea has been 
-around for a long time, and it is how we humans typically learn. If you
-imagine any childhood primary school education, there is an ordering of
-classes and topics. Arithmetic is taught before algebra, for example. 
-Likewise, algebra is taught before calculus. The skills and knowledge learned 
-in the earlier subjects provide a scaffolding for later lessons. The same 
-principle can be applied to machine learning, where training on easier tasks 
-can provide a scaffolding for harder tasks in the future. 
+This mode is an extension of _Built-in Training and Inference_, and is
+particularly helpful when training intricate behaviors for complex environments.
+Curriculum learning is a way of training a machine learning model where more
+difficult aspects of a problem are gradually introduced in such a way that the
+model is always optimally challenged. This idea has been around for a long time,
+and it is how we humans typically learn. If you imagine any childhood primary
+school education, there is an ordering of classes and topics. Arithmetic is
+taught before algebra, for example. Likewise, algebra is taught before calculus.
+The skills and knowledge learned in the earlier subjects provide a scaffolding
+for later lessons. The same principle can be applied to machine learning, where
+training on easier tasks can provide a scaffolding for harder tasks in the
+future.
-    <img src="images/math.png" 
-        alt="Example Math Curriculum" 
+    <img src="images/math.png"
+        alt="Example Math Curriculum"
-_Example of a mathematics curriculum. Lessons progress from simpler topics to more 
-complex ones, with each building on the last._
+_Example of a mathematics curriculum. Lessons progress from simpler topics to
+more complex ones, with each building on the last._
-When we think about how reinforcement learning actually works, the 
-learning signal is reward received occasionally throughout training. 
-The starting point when training an agent to accomplish this task will be a 
-random policy. That starting policy will have the agent running in circles,
-and will likely never, or very rarely achieve the reward for complex
-environments. Thus by simplifying the environment at the beginning of training,
-we allow the agent to quickly update the random policy to a more meaningful
-one that is successively improved as the environment gradually increases in
-complexity. In our example, we can imagine first training the medic when each 
-team only contains one player, and then iteratively increasing the number of
-players (i.e. the environment complexity). The ML-Agents toolkit supports setting
-custom environment parameters within the Academy. This allows
-elements of the environment related to difficulty or complexity to be
-dynamically adjusted based on training progress.
+When we think about how reinforcement learning actually works, the learning
+signal is reward received occasionally throughout training. The starting point
+when training an agent to accomplish this task will be a random policy. That
+starting policy will have the agent running in circles, and will likely never,
+or very rarely achieve the reward for complex environments. Thus by simplifying
+the environment at the beginning of training, we allow the agent to quickly
+update the random policy to a more meaningful one that is successively improved
+as the environment gradually increases in complexity. In our example, we can
+imagine first training the medic when each team only contains one player, and
+then iteratively increasing the number of players (i.e. the environment
+complexity). The ML-Agents toolkit supports setting custom environment
+parameters within the Academy. This allows elements of the environment related
+to difficulty or complexity to be dynamically adjusted based on training
+progress.

 The [Training with Curriculum Learning](Training-Curriculum-Learning.md)
 tutorial covers this training mode with the **Wall Area** sample environment.
-It is often more intuitive to simply demonstrate the behavior we
-want an agent to perform, rather than attempting to have it learn via
-trial-and-error methods. For example, instead of training the medic by
-setting up its reward function, this mode allows providing real examples from
-a game controller on how the medic should behave. More specifically,
-in this mode, the Brain type during training is set to Player and all the
-actions performed with the controller (in addition to the agent observations)
-will be recorded and sent to the Python API. The imitation learning algorithm
-will then use these pairs of observations and actions from the human player
-to learn a policy. [Video Link](https://youtu.be/kpb8ZkMBFYs).
+It is often more intuitive to simply demonstrate the behavior we want an agent
+to perform, rather than attempting to have it learn via trial-and-error methods.
+For example, instead of training the medic by setting up its reward function,
+this mode allows providing real examples from a game controller on how the medic
+should behave. More specifically, in this mode, the Brain type during training
+is set to Player and all the actions performed with the controller (in addition
+to the agent observations) will be recorded and sent to the Python API. The
+imitation learning algorithm will then use these pairs of observations and
+actions from the human player to learn a policy. [Video
+Link](https://youtu.be/kpb8ZkMBFYs).
-The [Training with Imitation Learning](Training-Imitation-Learning.md) tutorial covers this 
-training mode with the **Banana Collector** sample environment.
+The [Training with Imitation Learning](Training-Imitation-Learning.md) tutorial
+covers this training mode with the **Banana Collector** sample environment.
-While the discussion so-far has mostly focused on training a single agent, with 
-ML-Agents, several training scenarios are possible.
-We are excited to see what kinds of novel and fun environments the community 
-creates. For those new to training intelligent agents, below are a few examples
-that can serve as inspiration:
-* Single-Agent. A single Agent linked to a single Brain, with its own reward 
-signal. The traditional way of training an agent. An example is any
-single-player game, such as Chicken.
-[Video Link](https://www.youtube.com/watch?v=fiQsmdwEGT8&feature=youtu.be).
-* Simultaneous Single-Agent. Multiple independent Agents with independent
-reward signals linked to a single Brain. A parallelized version of the 
-traditional training scenario, which can speed-up and stabilize the training 
-process. Helpful when you have multiple versions of the same character in an 
-environment who should learn similar behaviors. An example might be training 
-a dozen robot-arms to each open a door simultaneously.
-[Video Link](https://www.youtube.com/watch?v=fq0JBaiCYNA).
-* Adversarial Self-Play. Two interacting Agents with inverse reward signals 
-linked to a single Brain. In two-player games, adversarial self-play can allow
-an agent to become increasingly more skilled, while always having the perfectly 
-matched opponent: itself. This was the strategy employed when training AlphaGo, 
-and more recently used by OpenAI to train a human-beating 1-vs-1 Dota 2 agent.
-* Cooperative Multi-Agent. Multiple interacting Agents with a shared reward 
-signal linked to either a single or multiple different Brains. In this 
-scenario, all agents must work together to accomplish a task that cannot be 
-done alone. Examples include environments where each agent only has access to 
-partial information, which needs to be shared in order to accomplish the task 
-or collaboratively solve a puzzle.
-* Competitive Multi-Agent. Multiple interacting Agents with inverse reward 
-signals linked to either a single or multiple different Brains. In this 
-scenario, agents must compete with one another to either win a competition, 
-or obtain some limited set of resources. All team sports fall into this 
-scenario.
-* Ecosystem. Multiple interacting Agents with independent reward signals
-linked to either a single or multiple different Brains. This scenario can be 
-thought of as creating a small world in which animals with different goals all 
-interact, such as a savanna in which there might be zebras, elephants and 
-giraffes, or an autonomous driving simulation within an urban environment.
+While the discussion so-far has mostly focused on training a single agent, with
+ML-Agents, several training scenarios are possible. We are excited to see what
+kinds of novel and fun environments the community creates. For those new to
+training intelligent agents, below are a few examples that can serve as
+inspiration:
+
+- Single-Agent. A single Agent linked to a single Brain, with its own reward
+  signal. The traditional way of training an agent. An example is any
+  single-player game, such as Chicken. [Video
+  Link](https://www.youtube.com/watch?v=fiQsmdwEGT8&feature=youtu.be).
+- Simultaneous Single-Agent. Multiple independent Agents with independent reward
+  signals linked to a single Brain. A parallelized version of the traditional
+  training scenario, which can speed-up and stabilize the training process.
+  Helpful when you have multiple versions of the same character in an
+  environment who should learn similar behaviors. An example might be training a
+  dozen robot-arms to each open a door simultaneously. [Video
+  Link](https://www.youtube.com/watch?v=fq0JBaiCYNA).
+- Adversarial Self-Play. Two interacting Agents with inverse reward signals
+  linked to a single Brain. In two-player games, adversarial self-play can allow
+  an agent to become increasingly more skilled, while always having the
+  perfectly matched opponent: itself. This was the strategy employed when
+  training AlphaGo, and more recently used by OpenAI to train a human-beating
+  1-vs-1 Dota 2 agent.
+- Cooperative Multi-Agent. Multiple interacting Agents with a shared reward
+  signal linked to either a single or multiple different Brains. In this
+  scenario, all agents must work together to accomplish a task that cannot be
+  done alone. Examples include environments where each agent only has access to
+  partial information, which needs to be shared in order to accomplish the task
+  or collaboratively solve a puzzle.
+- Competitive Multi-Agent. Multiple interacting Agents with inverse reward
+  signals linked to either a single or multiple different Brains. In this
+  scenario, agents must compete with one another to either win a competition, or
+  obtain some limited set of resources. All team sports fall into this scenario.
+- Ecosystem. Multiple interacting Agents with independent reward signals linked
+  to either a single or multiple different Brains. This scenario can be thought
+  of as creating a small world in which animals with different goals all
+  interact, such as a savanna in which there might be zebras, elephants and
+  giraffes, or an autonomous driving simulation within an urban environment.
-Beyond the flexible training scenarios available, the ML-Agents toolkit includes 
+Beyond the flexible training scenarios available, the ML-Agents toolkit includes
-* **On Demand Decision Making** - With the ML-Agents toolkit it is possible to have agents 
-request decisions only when needed as opposed to requesting decisions at 
-every step of the environment. This enables training of turn based games, 
-games where agents 
-must react to events or games where agents can take actions of variable 
-duration. Switching between decision taking at every step and 
-on-demand-decision is one button click away. You can learn more about the 
-on-demand-decision feature 
-[here](Learning-Environment-Design-Agents.md#on-demand-decision-making).
+- **On Demand Decision Making** - With the ML-Agents toolkit it is possible to
+  have agents request decisions only when needed as opposed to requesting
+  decisions at every step of the environment. This enables training of turn
+  based games, games where agents must react to events or games where agents can
+  take actions of variable duration. Switching between decision taking at every
+  step and on-demand-decision is one button click away. You can learn more about
+  the on-demand-decision feature
+  [here](Learning-Environment-Design-Agents.md#on-demand-decision-making).
-* **Memory-enhanced Agents** - In some scenarios, agents must learn to 
-remember the past in order to take the 
-best decision. When an agent only has partial observability of the environment, 
-keeping track of past observations can help the agent learn. We provide an 
-implementation of _Long Short-term Memory_ 
-([LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory))
-in our trainers that enable the agent to store memories to be used in future 
-steps. You can learn more about enabling LSTM during training
-[here](Feature-Memory.md).
+- **Memory-enhanced Agents** - In some scenarios, agents must learn to remember
+  the past in order to take the best decision. When an agent only has partial
+  observability of the environment, keeping track of past observations can help
+  the agent learn. We provide an implementation of _Long Short-term Memory_
+  ([LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory)) in our trainers
+  that enable the agent to store memories to be used in future steps. You can
+  learn more about enabling LSTM during training [here](Feature-Memory.md).
+
+- **Monitoring Agent’s Decision Making** - Since communication in ML-Agents is a
+  two-way street, we provide an agent Monitor class in Unity which can display
+  aspects of the trained agent, such as the agents perception on how well it is
+  doing (called **value estimates**) within the Unity environment itself. By
+  leveraging Unity as a visualization tool and providing these outputs in
+  real-time, researchers and developers can more easily debug an agent’s
+  behavior. You can learn more about using the Monitor class
+  [here](Feature-Monitor.md).
-* **Monitoring Agent’s Decision Making** - Since communication in ML-Agents
-is a two-way street, we provide an agent Monitor class in Unity which can
-display aspects of the trained agent, such as the agents perception on how
-well it is doing (called **value estimates**) within the Unity environment
-itself. By leveraging Unity as a visualization tool and providing these 
-outputs in real-time, researchers and developers can more easily debug an 
-agent’s behavior. You can learn more about using the Monitor class 
-[here](Feature-Monitor.md).
+- **Complex Visual Observations** - Unlike other platforms, where the agent’s
+  observation might be limited to a single vector or image, the ML-Agents
+  toolkit allows multiple cameras to be used for observations per agent. This
+  enables agents to learn to integrate information from multiple visual streams.
+  This can be helpful in several scenarios such as training a self-driving car
+  which requires multiple cameras with different viewpoints, or a navigational
+  agent which might need to integrate aerial and first-person visuals. You can
+  learn more about adding visual observations to an agent
+  [here](Learning-Environment-Design-Agents.md#multiple-visual-observations).
-* **Complex Visual Observations** - Unlike other platforms, where the agent’s
-observation might be limited to a single vector or image, the ML-Agents toolkit allows
-multiple cameras to be used for observations per agent. This enables agents to
-learn to integrate information from multiple visual streams. This can be
-helpful in several scenarios such as training a self-driving car which requires 
-multiple cameras with different viewpoints, or a navigational agent which might 
-need to integrate aerial and first-person visuals. You can learn more about
-adding visual observations to an agent 
-[here](Learning-Environment-Design-Agents.md#multiple-visual-observations).
-        
-* **Broadcasting** - As discussed earlier, an External Brain sends the
-observations for all its Agents to the Python API by default. This is helpful 
-for training or inference. Broadcasting is a feature which can be enabled
-for the other three modes (Player, Internal, Heuristic) where the Agent
-observations and actions are also sent to the Python API (despite the fact
-that the Agent is **not** controlled by the Python API). This feature is
-leveraged by Imitation Learning, where the observations and actions for a
-Player Brain are used to learn the policies of an agent through demonstration.
-However, this could also be helpful for the Heuristic and Internal Brains,
-particularly when debugging agent behaviors. You can learn more about using 
-the broadcasting feature 
-[here](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).
+- **Broadcasting** - As discussed earlier, an External Brain sends the
+  observations for all its Agents to the Python API by default. This is helpful
+  for training or inference. Broadcasting is a feature which can be enabled for
+  the other three modes (Player, Internal, Heuristic) where the Agent
+  observations and actions are also sent to the Python API (despite the fact
+  that the Agent is **not** controlled by the Python API). This feature is
+  leveraged by Imitation Learning, where the observations and actions for a
+  Player Brain are used to learn the policies of an agent through demonstration.
+  However, this could also be helpful for the Heuristic and Internal Brains,
+  particularly when debugging agent behaviors. You can learn more about using
+  the broadcasting feature
+  [here](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).
-* **Docker Set-up (Experimental)** - To facilitate setting up ML-Agents
-without installing Python or TensorFlow directly, we provide a 
-[guide](Using-Docker.md) on how
-to create and run a Docker container.
+- **Docker Set-up (Experimental)** - To facilitate setting up ML-Agents without
+  installing Python or TensorFlow directly, we provide a
+  [guide](Using-Docker.md) on how to create and run a Docker container.
-* **Cloud Training on AWS** - To facilitate using the ML-Agents toolkit on
-Amazon Web Services (AWS) machines, we provide a 
-[guide](Training-on-Amazon-Web-Service.md)
-on how to set-up EC2 instances in addition to a public pre-configured Amazon 
-Machine Image (AMI).
+- **Cloud Training on AWS** - To facilitate using the ML-Agents toolkit on
+  Amazon Web Services (AWS) machines, we provide a
+  [guide](Training-on-Amazon-Web-Service.md) on how to set-up EC2 instances in
+  addition to a public pre-configured Amazon Machine Image (AMI).
-* **Cloud Training on Microsoft Azure** - To facilitate using the ML-Agents toolkit on
-Azure machines, we provide a 
-[guide](Training-on-Microsoft-Azure.md)
-on how to set-up virtual machine instances in addition to a pre-configured data science image.
+- **Cloud Training on Microsoft Azure** - To facilitate using the ML-Agents
+  toolkit on Azure machines, we provide a
+  [guide](Training-on-Microsoft-Azure.md) on how to set-up virtual machine
+  instances in addition to a pre-configured data science image.
-To briefly summarize: The ML-Agents toolkit enables games and simulations built in Unity
-to serve as the platform for training intelligent agents. It is designed
-to enable a large variety of training modes and scenarios and comes packed
-with several features to enable researchers and developers to leverage
+To briefly summarize: The ML-Agents toolkit enables games and simulations built
+in Unity to serve as the platform for training intelligent agents. It is
+designed to enable a large variety of training modes and scenarios and comes
+packed with several features to enable researchers and developers to leverage
-To help you use ML-Agents, we've created several in-depth tutorials 
-for [installing ML-Agents](Installation.md), 
-[getting started](Getting-Started-with-Balance-Ball.md) 
-with the 3D Balance Ball environment (one of our many 
-[sample environments](Learning-Environment-Examples.md)) and 
+To help you use ML-Agents, we've created several in-depth tutorials for
+[installing ML-Agents](Installation.md),
+[getting started](Getting-Started-with-Balance-Ball.md) with the 3D Balance Ball
+environment (one of our many
+[sample environments](Learning-Environment-Examples.md)) and
-
--- a/docs/Python-API.md
+++ b/docs/Python-API.md
 # Python API

-The ML-Agents toolkit provides a Python API for controlling the agent simulation loop of a environment or game built with Unity. This API is used by the ML-Agent training algorithms (run with `learn.py`), but you can also write your Python programs using this API. 
+The ML-Agents toolkit provides a Python API for controlling the agent simulation
+loop of a environment or game built with Unity. This API is used by the ML-Agent
+training algorithms (run with `learn.py`), but you can also write your Python
+programs using this API.
-* **UnityEnvironment** — the main interface between the Unity application and your code. Use UnityEnvironment to start and control a simulation or training session.
-* **BrainInfo** — contains all the data from agents in the simulation, such as observations and rewards.
-* **BrainParameters** — describes the data elements in a BrainInfo object. For example, provides the array length of an observation in BrainInfo.
+- **UnityEnvironment** — the main interface between the Unity application and
+  your code. Use UnityEnvironment to start and control a simulation or training
+  session.
+- **BrainInfo** — contains all the data from agents in the simulation, such as
+  observations and rewards.
+- **BrainParameters** — describes the data elements in a BrainInfo object. For
+  example, provides the array length of an observation in BrainInfo.
-These classes are all defined in the `python/unityagents` folder of the ML-Agents SDK.
+These classes are all defined in the `mlagents/envs` folder of the ML-Agents SDK.
-To communicate with an agent in a Unity environment from a Python program, the agent must either use an **External** brain or use a brain that is broadcasting (has its **Broadcast** property set to true). Your code is expected to return actions for agents with external brains, but can only observe broadcasting brains (the information you receive for an agent is the same in both cases). See [Using the Broadcast Feature](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).
+To communicate with an agent in a Unity environment from a Python program, the
+agent must either use an **External** brain or use a brain that is broadcasting
+(has its **Broadcast** property set to true). Your code is expected to return
+actions for agents with external brains, but can only observe broadcasting
+brains (the information you receive for an agent is the same in both cases). See
+[Using the Broadcast
+Feature](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).
-For a simple example of using the Python API to interact with a Unity environment, see the Basic [Jupyter](Background-Jupyter.md) notebook (`python/notebooks/getting-started.ipynb`), which opens an environment, runs a few simulation steps taking random actions, and closes the environment. 
+For a simple example of using the Python API to interact with a Unity
+environment, see the Basic [Jupyter](Background-Jupyter.md) notebook
+(`notebooks/getting-started.ipynb`), which opens an environment, runs a few
+simulation steps taking random actions, and closes the environment.
-_Notice: Currently communication between Unity and Python takes place over an open socket without authentication. As such, please make sure that the network where training takes place is secure. This will be addressed in a future release._
+_Notice: Currently communication between Unity and Python takes place over an
+open socket without authentication. As such, please make sure that the network
+where training takes place is secure. This will be addressed in a future
+release._
-Python-side communication happens through `UnityEnvironment` which is located in `python/unityagents`. To load a Unity environment from a built binary file, put the file in the same directory as `unityagents`. For example, if the filename of your Unity environment is 3DBall.app, in python, run:
+Python-side communication happens through `UnityEnvironment` which is located in
+`mlagents/envs`. To load a Unity environment from a built binary file, put the
+file in the same directory as `envs`. For example, if the filename of your Unity
+environment is 3DBall.app, in python, run:

 ```python
 from unityagents import UnityEnvironment
-* `file_name` is the name of the environment binary (located in the root directory of the python project).
-* `worker_id` indicates which port to use for communication with the environment. For use in parallel training regimes such as A3C.
-* `seed` indicates the seed to use when generating random numbers during the training process. In environments which do not involve physics calculations, setting the seed enables reproducible experimentation by ensuring that the environment and trainers utilize the same random seed.
+- `file_name` is the name of the environment binary (located in the root
+  directory of the python project).
+- `worker_id` indicates which port to use for communication with the
+  environment. For use in parallel training regimes such as A3C.
+- `seed` indicates the seed to use when generating random numbers during the
+  training process. In environments which do not involve physics calculations,
+  setting the seed enables reproducible experimentation by ensuring that the
+  environment and trainers utilize the same random seed.
-If you want to directly interact with the Editor, you need to use `file_name=None`, then press the :arrow_forward: button in the Editor when the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen
+If you want to directly interact with the Editor, you need to use
+`file_name=None`, then press the :arrow_forward: button in the Editor when the
+message _"Start training by pressing the Play button in the Unity Editor"_ is
+displayed on the screen
-* **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of the list corresponds to the n<sup>th</sup> observation of the brain. 
-* **`vector_observations`** : A two dimensional numpy array of dimension `(batch size, vector observation size)`.
-* **`text_observations`** : A list of string corresponding to the agents text observations.
-* **`memories`** : A two dimensional numpy array of dimension `(batch size, memory size)` which corresponds to the memories sent at the previous step.
-* **`rewards`** : A list as long as the number of agents using the brain containing the rewards they each obtained at the previous step. 
-* **`local_done`** : A list as long as the number of agents using the brain containing  `done` flags (whether or not the agent is done). 
-* **`max_reached`** : A list as long as the number of agents using the brain containing true if the agents reached their max steps.
-* **`agents`** : A list of the unique ids of the agents using the brain.
-* **`previous_actions`** : A two dimensional numpy array of dimension `(batch size, vector action size)` if the vector action space is continuous and `(batch size, number of branches)` if the vector action space is discrete.
+- **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of
+  the list corresponds to the n<sup>th</sup> observation of the brain. 
+- **`vector_observations`** : A two dimensional numpy array of dimension `(batch
+  size, vector observation size)`.
+- **`text_observations`** : A list of string corresponding to the agents text
+  observations.
+- **`memories`** : A two dimensional numpy array of dimension `(batch size,
+  memory size)` which corresponds to the memories sent at the previous step.
+- **`rewards`** : A list as long as the number of agents using the brain
+  containing the rewards they each obtained at the previous step. 
+- **`local_done`** : A list as long as the number of agents using the brain
+  containing  `done` flags (whether or not the agent is done). 
+- **`max_reached`** : A list as long as the number of agents using the brain
+  containing true if the agents reached their max steps.
+- **`agents`** : A list of the unique ids of the agents using the brain.
+- **`previous_actions`** : A two dimensional numpy array of dimension `(batch
+  size, vector action size)` if the vector action space is continuous and
+  `(batch size, number of branches)` if the vector action space is discrete.
-Once loaded, you can use your UnityEnvironment object, which referenced by a variable named `env` in this example, can be used in the following way:  
+Once loaded, you can use your UnityEnvironment object, which referenced by a
+variable named `env` in this example, can be used in the following way:  
+
-Prints all parameters relevant to the loaded environment and the external brains.  
+  Prints all parameters relevant to the loaded environment and the external
+  brains.  
-Send a reset signal to the environment, and provides a dictionary mapping brain names to BrainInfo objects.  
-    - `train_model` indicates whether to run the environment in train (`True`) or test (`False`) mode.
-    - `config` is an optional dictionary of configuration flags specific to the environment. For generic environments, `config` can be ignored. `config` is a dictionary of strings to floats where the keys are the names of the `resetParameters` and the values are their corresponding float values. Define the reset parameters on the [Academy Inspector](Learning-Environment-Design-Academy.md#academy-properties) window in the Unity Editor.
+  Send a reset signal to the environment, and provides a dictionary mapping
+  brain names to BrainInfo objects.  
+  - `train_model` indicates whether to run the environment in train (`True`) or
+    test (`False`) mode.
+  - `config` is an optional dictionary of configuration flags specific to the
+    environment. For generic environments, `config` can be ignored. `config` is
+    a dictionary of strings to floats where the keys are the names of the
+    `resetParameters` and the values are their corresponding float values.
+    Define the reset parameters on the [Academy
+    Inspector](Learning-Environment-Design-Academy.md#academy-properties) window
+    in the Unity Editor.
-Sends a step signal to the environment using the actions. For each brain : 
-    - `action` can be one dimensional arrays or two dimensional arrays if you have multiple agents per brains.
-    - `memory` is an optional input that can be used to send a list of floats per agents to be retrieved at the next step.
-    - `text_action` is an optional input that be used to send a single string per agent.
+  Sends a step signal to the environment using the actions. For each brain :
+  - `action` can be one dimensional arrays or two dimensional arrays if you have
+    multiple agents per brains.
+  - `memory` is an optional input that can be used to send a list of floats per
+    agents to be retrieved at the next step.
+  - `text_action` is an optional input that be used to send a single string per
+    agent.
-    
-    For example, to access the BrainInfo belonging to a brain called 'brain_name', and the BrainInfo field 'vector_observations':
+
+    For example, to access the BrainInfo belonging to a brain called
+    'brain_name', and the BrainInfo field 'vector_observations':
-    ``` 
+    ```
-    Note that if you have more than one external brain in the environment, you must provide dictionaries from brain names to arrays for     `action`, `memory` and `value`. For example: If you have two external brains named `brain1` and `brain2` each with one agent taking     two continuous actions, then you can have:
+    Note that if you have more than one external brain in the environment, you
+    must provide dictionaries from brain names to arrays for `action`, `memory`
+    and `value`. For example: If you have two external brains named `brain1` and
+    `brain2` each with one agent taking two continuous actions, then you can
+    have:
-Returns a dictionary mapping brain names to BrainInfo objects.  
- **Close : `env.close()`**  
-Sends a shutdown signal to the environment and closes the communication socket.
+    Returns a dictionary mapping brain names to BrainInfo objects.  
+- **Close : `env.close()`**
+  Sends a shutdown signal to the environment and closes the communication
+  socket.
--- a/docs/Readme.md
+++ b/docs/Readme.md
 # Unity ML-Agents Toolkit Documentation

 ## Installation & Set-up
- * [Installation](Installation.md)
-    * [Background: Jupyter Notebooks](Background-Jupyter.md)
-    * [Docker Set-up](Using-Docker.md)
- * [Basic Guide](Basic-Guide.md)
+
+* [Installation](Installation.md)
+  * [Background: Jupyter Notebooks](Background-Jupyter.md)
+  * [Docker Set-up](Using-Docker.md)
+* [Basic Guide](Basic-Guide.md)
- * [ML-Agents Toolkit Overview](ML-Agents-Overview.md)
-    * [Background: Unity](Background-Unity.md)
-    * [Background: Machine Learning](Background-Machine-Learning.md)
-    * [Background: TensorFlow](Background-TensorFlow.md)
- * [Getting Started with the 3D Balance Ball Environment](Getting-Started-with-Balance-Ball.md)
- * [Example Environments](Learning-Environment-Examples.md)
+
+* [ML-Agents Toolkit Overview](ML-Agents-Overview.md)
+  * [Background: Unity](Background-Unity.md)
+  * [Background: Machine Learning](Background-Machine-Learning.md)
+  * [Background: TensorFlow](Background-TensorFlow.md)
+* [Getting Started with the 3D Balance Ball Environment](Getting-Started-with-Balance-Ball.md)
+* [Example Environments](Learning-Environment-Examples.md)
- * [Making a New Learning Environment](Learning-Environment-Create-New.md)
- * [Designing a Learning Environment](Learning-Environment-Design.md)
-     * [Agents](Learning-Environment-Design-Agents.md)
-     * [Academy](Learning-Environment-Design-Academy.md)
-     * [Brains](Learning-Environment-Design-Brains.md): [Player](Learning-Environment-Design-Player-Brains.md), [Heuristic](Learning-Environment-Design-Heuristic-Brains.md), [Internal & External](Learning-Environment-Design-External-Internal-Brains.md)
- * [Learning Environment Best Practices](Learning-Environment-Best-Practices.md)
- * [Using the Monitor](Feature-Monitor.md)
- * [Using an Executable Environment](Learning-Environment-Executable.md)
- * [TensorFlowSharp in Unity (Experimental)](Using-TensorFlow-Sharp-in-Unity.md)
- 
+
+* [Making a New Learning Environment](Learning-Environment-Create-New.md)
+* [Designing a Learning Environment](Learning-Environment-Design.md)
+  * [Agents](Learning-Environment-Design-Agents.md)
+  * [Academy](Learning-Environment-Design-Academy.md)
+  * [Brains](Learning-Environment-Design-Brains.md):
+    [Player](Learning-Environment-Design-Player-Brains.md),
+    [Heuristic](Learning-Environment-Design-Heuristic-Brains.md),
+    [Internal & External](Learning-Environment-Design-External-Internal-Brains.md)
+* [Learning Environment Best Practices](Learning-Environment-Best-Practices.md)
+* [Using the Monitor](Feature-Monitor.md)
+* [Using an Executable Environment](Learning-Environment-Executable.md)
+* [TensorFlowSharp in Unity (Experimental)](Using-TensorFlow-Sharp-in-Unity.md)
+
- * [Training ML-Agents](Training-ML-Agents.md)
- * [Training with Proximal Policy Optimization](Training-PPO.md)
- * [Training with Curriculum Learning](Training-Curriculum-Learning.md)
- * [Training with Imitation Learning](Training-Imitation-Learning.md)
- * [Training with LSTM](Feature-Memory.md)
- * [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md)
- * [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md)
- * [Using TensorBoard to Observe Training](Using-Tensorboard.md)
+
+* [Training ML-Agents](Training-ML-Agents.md)
+* [Training with Proximal Policy Optimization](Training-PPO.md)
+* [Training with Curriculum Learning](Training-Curriculum-Learning.md)
+* [Training with Imitation Learning](Training-Imitation-Learning.md)
+* [Training with LSTM](Feature-Memory.md)
+* [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md)
+* [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md)
+* [Using TensorBoard to Observe Training](Using-Tensorboard.md)
- * [Migrating from earlier versions of ML-Agents](Migrating.md)
- * [Frequently Asked Questions](FAQ.md)
- * [ML-Agents Glossary](Glossary.md)
- * [Limitations](Limitations.md)
- 
+
+* [Migrating from earlier versions of ML-Agents](Migrating.md)
+* [Frequently Asked Questions](FAQ.md)
+* [ML-Agents Glossary](Glossary.md)
+* [Limitations](Limitations.md)
+
- * [API Reference](API-Reference.md)
- * [How to use the Python API](Python-API.md)
- * [Wrapping Learning Environment as a Gym](../gym-unity/Readme.md)
+
+* [API Reference](API-Reference.md)
+* [How to use the Python API](Python-API.md)
+* [Wrapping Learning Environment as a Gym](../gym-unity/Readme.md)
--- a/docs/Training-Curriculum-Learning.md
+++ b/docs/Training-Curriculum-Learning.md

 ### Specifying a Metacurriculum

-We first create a folder inside `python/curricula/` for the environment we want
+We first create a folder inside `curricula/` for the environment we want
-`python/curricula/wall-jump/`. We will place our curriculums inside this folder.
+`curricula/wall-jump/`. We will place our curriculums inside this folder.

 ### Specifying a Curriculum


 Once our curriculum is defined, we have to use the reset parameters we defined
 and modify the environment from the agent's `AgentReset()` function. See
-[WallJumpAgent.cs](https://github.com/Unity-Technologies/ml-agents/blob/master/unity-environment/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs)
+[WallJumpAgent.cs](https://github.com/Unity-Technologies/ml-agents/blob/master/MLAgentsSDK/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs)
 for an example. Note that if the Academy's __Max Steps__ is not set to some
 positive number the environment will never be reset. The Academy must reset
 for the environment to reset.
 brains---BigWallBrain and SmallWallBrain. If we want to define a curriculum for
 the BigWallBrain, we will save `BigWallBrain.json` into
-`python/curricula/wall-jump/`.
+`curricula/wall-jump/`.

 ### Training with a Curriculum

--- a/docs/Training-Imitation-Learning.md
+++ b/docs/Training-Imitation-Learning.md
 3. Set the "Student" brain to External mode.
 4. Link the brains to the desired agents (one agent as the teacher and at least one agent as a student).
 5. In `trainer_config.yaml`, add an entry for the "Student" brain. Set the `trainer` parameter of this entry to `imitation`, and the `brain_to_imitate` parameter to the name of the teacher brain: "Teacher". Additionally, set `batches_per_epoch`, which controls how much training to do each moment. Increase the `max_steps` option if you'd like to keep training the agents for a longer period of time.
-6. Launch the training process with `python3 python/learn.py --train --slow`, and press the :arrow_forward: button in Unity when the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen
+6. Launch the training process with `learn --train --slow`, and press the :arrow_forward: button in Unity when the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen
 7. From the Unity window, control the agent with the Teacher brain by providing "teacher demonstrations" of the behavior you would like to see.
 8. Watch as the agent(s) with the student brain attached begin to behave similarly to the demonstrations.
 9. Once the Student agents are exhibiting the desired behavior, end the training process with `CTL+C` from the command line.
--- a/docs/Training-ML-Agents.md
+++ b/docs/Training-ML-Agents.md

 And then opening the URL: [localhost:6006](http://localhost:6006).
 
-When training is finished, you can find the saved model in the `python/models` folder under the assigned run-id — in the cats example, the path to the model would be `python/models/cob_1/CatsOnBicycles_cob_1.bytes`.
+When training is finished, you can find the saved model in the `models` folder
+under the assigned run-id — in the cats example, the path to the model would be
+`models/cob_1/CatsOnBicycles_cob_1.bytes`.

 While this example used the default training hyperparameters, you can edit the [training_config.yaml file](#training-config-file) with a text editor to set different values. 

 * `--curriculum=<file>` – Specify a curriculum JSON file for defining the lessons for curriculum training. See [Curriculum Training](Training-Curriculum-Learning.md) for more information.
 * `--keep-checkpoints=<n>` – Specify the maximum number of model checkpoints to keep. Checkpoints are saved after the number of steps specified by the `save-freq` option. Once the maximum number of checkpoints has been reached, the oldest checkpoint is deleted when saving a new checkpoint. Defaults to 5.
 * `--lesson=<n>` – Specify which lesson to start with when performing curriculum training. Defaults to 0.
-* `--load` – If set, the training code loads an already trained model to initialize the neural network before training. The learning code looks for the model in `python/models/<run-id>/` (which is also where it saves models at the end of training). When not set (the default), the neural network weights are randomly initialized and an existing model is not loaded.
+* `--load` – If set, the training code loads an already trained model to initialize the neural network before training. The learning code looks for the model in `models/<run-id>/` (which is also where it saves models at the end of training). When not set (the default), the neural network weights are randomly initialized and an existing model is not loaded.
 * `--num-runs=<n>` - Sets the number of concurrent training sessions to perform. Default is set to 1. Set to higher values when benchmarking performance and multiple training sessions is desired. Training sessions are independent, and do not improve learning performance.
 * `--run-id=<path>` – Specifies an identifier for each training run. This identifier is used to name the subdirectories in which the trained model and summary statistics are saved as well as the saved model itself. The default id is "ppo". If you use TensorBoard to view the training statistics, always set a unique run-id for each training run. (The statistics for all runs with the same id are combined as if they were produced by a the same session.)
 * `--save-freq=<n>` Specifies how often (in  steps) to save the model during training. Defaults to 50000.
--- a/docs/dox-ml-agents.conf
+++ b/docs/dox-ml-agents.conf
 # spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
 # Note: If this tag is empty the current directory is searched.

-INPUT                  = ../unity-environment/Assets/ML-Agents/Scripts/Academy.cs \
-                         ../unity-environment/Assets/ML-Agents/Scripts/Agent.cs \
-                         ../unity-environment/Assets/ML-Agents/Scripts/Monitor.cs \
-                         ../unity-environment/Assets/ML-Agents/Scripts/Decision.cs
+INPUT                  = ../MLAgentsSDK/Assets/ML-Agents/Scripts/Academy.cs \
+                         ../MLAgentsSDK/Assets/ML-Agents/Scripts/Agent.cs \
+                         ../MLAgentsSDK/Assets/ML-Agents/Scripts/Monitor.cs \
+                         ../MLAgentsSDK/Assets/ML-Agents/Scripts/Decision.cs

 # This tag can be used to specify the character encoding of the source files
 # that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
--- a/gym-unity/Readme.md
+++ b/gym-unity/Readme.md

 The returned environment `env` will function as a gym.

-For more on using the gym interface, see our [Jupyter Notebook tutorial](../python/notebooks/getting-started-gym.ipynb).
-
+For more on using the gym interface, see our [Jupyter Notebook tutorial](../notebooks/getting-started-gym.ipynb).
- * It is only possible to use an environment with a single Brain.
- * By default the first visual observation is provided as the `observation`, if present. Otherwise vector observations are provided. 
- * All `BrainInfo` output from the environment can still be accessed from the `info` provided by `env.step(action)`.
- * Stacked vector observations are not supported.
- * Environment registration for use with `gym.make()` is currently not supported.
-
+* It is only possible to use an environment with a single Brain.
+* By default the first visual observation is provided as the `observation`, if
+  present. Otherwise vector observations are provided. 
+* All `BrainInfo` output from the environment can still be accessed from the
+  `info` provided by `env.step(action)`.
+* Stacked vector observations are not supported.
+* Environment registration for use with `gym.make()` is currently not supported.

 ## Running OpenAI Baselines Algorithms

--- a/ml-agents-protobuf/README.md
+++ b/ml-agents-protobuf/README.md
 1. Install pre-requisites.
 2. Un-comment line 4 in `make.bat`, and set to correct Grpc.Tools sub-directory.
 3. Run `make.bat`
-4. Copy created `communicator_objects` and `CommunicatorObjects` folders to their respective sub-directories within the `ml-agents` repository. 
-    * For Python, the generated files should be copied to: `python/communicator_objects` 
-    * For C#, the generated files should be copied to: `unity-environment/Assets/ML-Agents/Scripts/CommunicatorObjects`.
+4. Copy created `communicator_objects` and `CommunicatorObjects` folders to
+   their respective sub-directories within the `ml-agents` repository.
+    * For Python, the generated files should be copied to:
+      `mlagents/envs/communicator_objects`
+    * For C#, the generated files should be copied to:
+      `MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects`.