Several, small documentation improvements (#3903)

* Several, small documentation improvements - Re-organize main repo README - Minor clean-ups to Python package-specific readme files - Clean-up to Unity Inference Engine page - Update to the docs README - Added a specific cross-platform section in ML-Agents Overview to amplify Barracuda - Updated the links in Limitations.md to point to the specific subsections - Cleaned up the Designing a Learning Environment page. Added an intro paragraph. - Updated the installation guide to specifically call out local installation - A few minor formatting, spelling errors fixed.
5 年前 · 759e222e
--- a/README.md
+++ b/README.md
 <img src="docs/images/image-banner.png" align="middle" width="3000"/>

 # Unity ML-Agents Toolkit (Beta)
+
 [![docs badge](https://img.shields.io/badge/docs-reference-blue.svg)](https://github.com/Unity-Technologies/ml-agents/tree/release_1_docs/docs/)

 [![license badge](https://img.shields.io/badge/license-Apache--2.0-green.svg)](LICENSE)

 ## Features

- Unity environment control from Python
- 15+ sample Unity environments
- Two deep reinforcement learning algorithms, Proximal Policy Optimization (PPO)
-  and Soft Actor-Critic (SAC)
+- 15+ [example Unity environments](docs/Learning-Environment-Examples.md)
- Self-play mechanism for training agents in adversarial scenarios
- Train memory-enhanced agents using deep reinforcement learning
- Easily definable Curriculum Learning and Generalization scenarios
+- Flexible Unity SDK that can be integrated into your game or custom Unity scene
+- Training using two deep reinforcement learning algorithms, Proximal Policy
+  Optimization (PPO) and Soft Actor-Critic (SAC)
+- Self-play mechanism for training agents in adversarial scenarios
+- Easily definable Curriculum Learning scenarios for complex tasks
+- Train robust agents using environment randomization
- Wrap learning environments as a gym
- Utilizes the Unity Inference Engine
- Train using concurrent Unity environment instances
+- Train using multiple concurrent Unity environment instances
+- Utilizes the [Unity Inference Engine](docs/Unity-Inference-Engine.md) to
+  provide native cross-platform support
+- Unity environment [control from Python](docs/Python-API.md)
+- Wrap Unity learning environments as a [gym](gym-unity/README.md)
+
+See our [ML-Agents Overview](docs/ML-Agents-Overview.md) page for detailed
+descriptions of all these features.

 ## Releases & Documentation

--- a/com.unity.ml-agents/Documentation~/com.unity.ml-agents.md
+++ b/com.unity.ml-agents/Documentation~/com.unity.ml-agents.md
 # About ML-Agents package (`com.unity.ml-agents`)

-The Unity ML-Agents package contains the C# SDK for the [Unity ML-Agents Toolkit].
+The Unity ML-Agents package contains the C# SDK for the [Unity ML-Agents
+Toolkit].
-The package allows you to convert any Unity scene to into a learning
-environment and train character behaviors using a variety of machine learning
-algorithms. Additionally, it allows you to embed these trained behaviors back into
-Unity scenes to control your characters. More specifically, the package provides
- the following core functionalities:
+The package allows you to convert any Unity scene to into a learning environment
+and train character behaviors using a variety of machine learning algorithms.
+Additionally, it allows you to embed these trained behaviors back into Unity
+scenes to control your characters. More specifically, the package provides the
+following core functionalities:
-* Define Agents: entities, or characters, whose behavior will be learned. Agents are entities
-  that generate observations (through sensors), take actions, and receive rewards from
-  the environment.
-* Define Behaviors: entities that specifiy how an agent should act. Multiple agents can
-  share the same Behavior and a scene may have multiple Behaviors.
-* Record demonstrations of an agent within the Editor. You can use demonstrations
-  to help train a behavior for that agent.
-* Embedding a trained behavior into the scene via the [Unity Inference Engine].
-  Embedded behaviors allow you to switch an Agent between learning and inference.
+- Define Agents: entities, or characters, whose behavior will be learned. Agents
+  are entities that generate observations (through sensors), take actions, and
+  receive rewards from the environment.
+- Define Behaviors: entities that specifiy how an agent should act. Multiple
+  agents can share the same Behavior and a scene may have multiple Behaviors.
+- Record demonstrations of an agent within the Editor. You can use
+  demonstrations to help train a behavior for that agent.
+- Embedding a trained behavior into the scene via the [Unity Inference Engine].
+  Embedded behaviors allow you to switch an Agent between learning and
+  inference.
-Note that the *ML-Agents* package does not contain the machine learning algorithms for training
-behaviors. The *ML-Agents* package only supports instrumenting a Unity scene, setting it up for
-training, and then embedding the trained model back into your Unity scene. The machine learning
-algorithms that orchestrate training are part of the companion [Python package].
-
+Note that the _ML-Agents_ package does not contain the machine learning
+algorithms for training behaviors. The _ML-Agents_ package only supports
+instrumenting a Unity scene, setting it up for training, and then embedding the
+trained model back into your Unity scene. The machine learning algorithms that
+orchestrate training are part of the companion [Python package].
-|**Location**|**Description**|
-|---|---|
-|*Documentation~*|Contains the documentation for the Unity package.|
-|*Editor*|Contains utilities for Editor windows and drawers.|
-|*Plugins*|Contains third-party DLLs.|
-|*Runtime*|Contains core C# APIs for integrating ML-Agents into your Unity scene. |
-|*Tests*|Contains the unit tests for the package.|
-
+| **Location**     | **Description**                                                        |
+| ---------------- | ---------------------------------------------------------------------- |
+| _Documentation~_ | Contains the documentation for the Unity package.                      |
+| _Editor_         | Contains utilities for Editor windows and drawers.                     |
+| _Plugins_        | Contains third-party DLLs.                                             |
+| _Runtime_        | Contains core C# APIs for integrating ML-Agents into your Unity scene. |
+| _Tests_          | Contains the unit tests for the package.                               |
+
-To install this *ML-Agents* package, follow the instructions in the [Package Manager documentation].
+To install this _ML-Agents_ package, follow the instructions in the [Package
+Manager documentation].
-
-This version of the Unity ML-Agents package is compatible with the following versions of the
-Unity Editor:
-
-* 2018.4 and later
+This version of the Unity ML-Agents package is compatible with the following
+versions of the Unity Editor:
+- 2018.4 and later
-## Known limitations
+## Known Limitations
-Training is limited to the Unity Editor and Standalone builds on Windows, MacOS, and Linux with the
-Mono scripting backend. Currently, training does not work with the IL2CPP scripting backend.  Your
-environment will default to inference mode if training is not supported or is not currently running.
+
+Training is limited to the Unity Editor and Standalone builds on Windows, MacOS,
+and Linux with the Mono scripting backend. Currently, training does not work
+with the IL2CPP scripting backend. Your environment will default to inference
+mode if training is not supported or is not currently running.
-Inference is executed via the [Unity Inference Engine](https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html).
+
+Inference is executed via the
+[Unity Inference Engine](https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html).
- All platforms supported.
+
+All platforms supported.
- All platforms supported except:
-  - WebGL and GLES 3/2 on Android / iPhone
- **NOTE:** Mobile platform support includes:
- - Vulkan for Android
- - Metal for iOS.
+All platforms supported except:
+
+- WebGL and GLES 3/2 on Android / iPhone
+
+**NOTE:** Mobile platform support includes:
+
+- Vulkan for Android
+- Metal for iOS.

 ### Headless Mode

 `Academy.Instance.EnvironmentStep()`

 ### Unity Inference Engine Models
+
-
-the documentation, you can checkout our
-[GitHUb Repository], which also includes a number of ways to [connect with us]
-including our [ML-Agents Forum].
-
+the documentation, you can checkout our [GitHUb Repository], which also includes
+a number of ways to [connect with us] including our [ML-Agents Forum].
-[Unity ML-Agents Toolkit]: https://github.com/Unity-Technologies/ml-agents
-[Unity Inference Engine]: https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html
-[Package Manager documentation]: https://docs.unity3d.com/Manual/upm-ui-install.html
-[installation instructions]: https://github.com/Unity-Technologies/ml-agents/blob/release_1_docs/docs/Installation.md
-[GitHUb Repository]: https://github.com/Unity-Technologies/ml-agents
-[Python package]: https://github.com/Unity-Technologies/ml-agents
-[Execution Order of Event Functions]: https://docs.unity3d.com/Manual/ExecutionOrder.html
-[connect with us]: https://github.com/Unity-Technologies/ml-agents#community-and-feedback
-[ML-Agents Forum]: https://forum.unity.com/forums/ml-agents.453/
+[unity ML-Agents Toolkit]: https://github.com/Unity-Technologies/ml-agents
+[unity inference engine]:
+  https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html
+[package manager documentation]:
+  https://docs.unity3d.com/Manual/upm-ui-install.html
+[installation instructions]:
+  https://github.com/Unity-Technologies/ml-agents/blob/release_1_docs/docs/Installation.md
+[github repository]: https://github.com/Unity-Technologies/ml-agents
+[python package]: https://github.com/Unity-Technologies/ml-agents
+[execution order of event functions]:
+  https://docs.unity3d.com/Manual/ExecutionOrder.html
+[connect with us]:
+  https://github.com/Unity-Technologies/ml-agents#community-and-feedback
+[ml-agents forum]: https://forum.unity.com/forums/ml-agents.453/
--- a/docs/Background-TensorFlow.md
+++ b/docs/Background-TensorFlow.md
 performing computations using data flow graphs, the underlying representation of
 deep learning models. It facilitates training and inference on CPUs and GPUs in
 a desktop, server, or mobile device. Within the ML-Agents Toolkit, when you
-train the behavior of an agent, the output is a TensorFlow model (.nn) file that
-you can then associate with an Agent. Unless you implement a new algorithm, the
-use of TensorFlow is mostly abstracted away and behind the scenes.
+train the behavior of an agent, the output is a model (.nn) file that you can
+then associate with an Agent. Unless you implement a new algorithm, the use of
+TensorFlow is mostly abstracted away and behind the scenes.

 ## TensorBoard

--- a/docs/Custom-SideChannels.md
+++ b/docs/Custom-SideChannels.md

 ## Overview

-In order to use a side channel, it must be implemented as both Unity and Python classes.
+In order to use a side channel, it must be implemented as both Unity and Python
+classes.
-The side channel will have to implement the `SideChannel` abstract class and the following method.
- * `OnMessageReceived(IncomingMessage msg)` : You must implement this method and read the data from IncomingMessage.
-  The data must be read in the order that it was written.
+The side channel will have to implement the `SideChannel` abstract class and the
+following method.
-The side channel must also assign a `ChannelId` property in the constructor. The `ChannelId` is a Guid
-(or UUID in Python) used to uniquely identify a side channel. This Guid must be the same on C# and Python.
-There can only be one side channel of a certain id during communication.
+- `OnMessageReceived(IncomingMessage msg)` : You must implement this method and
+  read the data from IncomingMessage. The data must be read in the order that it
+  was written.
-To send data from C# to Python, create an `OutgoingMessage` instance, add data to it, call the
-`base.QueueMessageToSend(msg)` method inside the side channel, and call the
-`OutgoingMessage.Dispose()` method.
+The side channel must also assign a `ChannelId` property in the constructor. The
+`ChannelId` is a Guid (or UUID in Python) used to uniquely identify a side
+channel. This Guid must be the same on C# and Python. There can only be one side
+channel of a certain id during communication.
-To register a side channel on the Unity side, call `SideChannelManager.RegisterSideChannel` with the side channel
-as only argument.
+To send data from C# to Python, create an `OutgoingMessage` instance, add data
+to it, call the `base.QueueMessageToSend(msg)` method inside the side channel,
+and call the `OutgoingMessage.Dispose()` method.
+
+To register a side channel on the Unity side, call
+`SideChannelManager.RegisterSideChannel` with the side channel as only argument.
-The side channel will have to implement the `SideChannel` abstract class. You must implement :
- * `on_message_received(self, msg: "IncomingMessage") -> None` : You must implement this method and read the data
-  from IncomingMessage. The data must be read in the order that it was written.
+The side channel will have to implement the `SideChannel` abstract class. You
+must implement :
-The side channel must also assign a `channel_id` property in the constructor. The `channel_id` is a UUID
-(referred in C# as Guid) used to uniquely identify a side channel. This number must be the same on C# and
-Python. There can only be one side channel of a certain id during communication.
+- `on_message_received(self, msg: "IncomingMessage") -> None` : You must
+  implement this method and read the data from IncomingMessage. The data must be
+  read in the order that it was written.
-To assign the `channel_id` call the abstract class constructor with the appropriate `channel_id` as follows:
+The side channel must also assign a `channel_id` property in the constructor.
+The `channel_id` is a UUID (referred in C# as Guid) used to uniquely identify a
+side channel. This number must be the same on C# and Python. There can only be
+one side channel of a certain id during communication.
+
+To assign the `channel_id` call the abstract class constructor with the
+appropriate `channel_id` as follows:
-To send a byte array from Python to C#, create an `OutgoingMessage` instance, add data to it, and call the
- `super().queue_message_to_send(msg)` method inside the side channel.
+To send a byte array from Python to C#, create an `OutgoingMessage` instance,
+add data to it, and call the `super().queue_message_to_send(msg)` method inside
+the side channel.
-To register a side channel on the Python side, pass the side channel as argument when creating the
-`UnityEnvironment` object. One of the arguments of the constructor (`side_channels`) is a list of side channels.
+To register a side channel on the Python side, pass the side channel as argument
+when creating the `UnityEnvironment` object. One of the arguments of the
+constructor (`side_channels`) is a list of side channels.
-Below is a simple implementation of a side channel that will exchange ascii encoded
-strings between a Unity environment and Python.
+Below is a simple implementation of a side channel that will exchange ASCII
+encoded strings between a Unity environment and Python.
-The first step is to create the `StringLogSideChannel` class within the Unity project.
-Here is an implementation of a `StringLogSideChannel` that will listen for messages
-from python and print them to the Unity debug log, as well as send error messages
-from Unity to python.
+The first step is to create the `StringLogSideChannel` class within the Unity
+project. Here is an implementation of a `StringLogSideChannel` that will listen
+for messages from python and print them to the Unity debug log, as well as send
+error messages from Unity to python.

 ```csharp
 using UnityEngine;
 instantiated and registered. This can typically be done wherever the logic of
 the side channel makes sense to be associated, for example on a MonoBehaviour
 object that might need to access data from the side channel. Here we show a
-simple MonoBehaviour object which instantiates and registeres the new side
-channel. If you have not done it already, make sure that the MonoBehaviour
-which registers the side channel is attached to a gameobject which will
-be live in your Unity scene.
+simple MonoBehaviour object which instantiates and registers the new side
+channel. If you have not done it already, make sure that the MonoBehaviour which
+registers the side channel is attached to a GameObject which will be live in
+your Unity scene.

 ```csharp
 using UnityEngine;

 ### Example Python code

-Now that we have created the necessary Unity C# classes, we can create their Python counterparts.
+Now that we have created the necessary Unity C# classes, we can create their
+Python counterparts.

 ```python
 from mlagents_envs.environment import UnityEnvironment
        super().queue_message_to_send(msg)
 ```

-
-We can then instantiate the new side channel,
-launch a `UnityEnvironment` with that side channel active, and send a series of
-messages to the Unity environment from Python using it.
+We can then instantiate the new side channel, launch a `UnityEnvironment` with
+that side channel active, and send a series of messages to the Unity environment
+from Python using it.

 ```python
 # Create the channel
 env.close()
 ```

-Now, if you run this script and press `Play` the Unity Editor when prompted,
-the console in the Unity Editor will display a message at every Python step.
+Now, if you run this script and press `Play` the Unity Editor when prompted, the
+console in the Unity Editor will display a message at every Python step.
 Additionally, if you press the Space Bar in the Unity Engine, a message will
 appear in the terminal.
--- a/docs/Getting-Started.md
+++ b/docs/Getting-Started.md
 Note how the `Mean Reward` value printed to the screen increases as training
 progresses. This is a positive sign that training is succeeding.

+**Note**: You can train using an executable rather than the Editor. To do so,
+follow the instructions in
+[Using an Executable](Learning-Environment-Executable.md).
+
 ### Observing Training Progress

 Once you start training using `mlagents-learn` in the way described in the
 (denoted by the `Saved Model` message) you can add it to the Unity project and
 use it with compatible Agents (the Agents that generated the model). **Note:**
 Do not just close the Unity Window once the `Saved Model` message appears.
-Either wait for the training process to close the window or press Ctrl+C at the
-command-line prompt. If you close the window manually, the `.nn` file containing
-the trained model is not exported into the ml-agents folder.
+Either wait for the training process to close the window or press `Ctrl+C` at
+the command-line prompt. If you close the window manually, the `.nn` file
+containing the trained model is not exported into the ml-agents folder.
-If you've quit the training early using Ctrl+C and want to resume training, run
-the same command again, appending the `--resume` flag:
+If you've quit the training early using `Ctrl+C` and want to resume training,
+run the same command again, appending the `--resume` flag:

 ```sh
 mlagents-learn config/trainer_config.yaml --run-id=first3DBallRun --resume
--- a/docs/Installation.md
+++ b/docs/Installation.md
 # Installation

 The ML-Agents Toolkit contains several components:
-* Unity package ([`com.unity.ml-agents`](../com.unity.ml-agents/)) contains the Unity C#
-SDK that will be integrated into your Unity scene.
-* Three Python packages:
-  * [`mlagents`](../ml-agents/) contains the machine learning algorithms that enables you
-  to train behaviors in your Unity scene. Most users of ML-Agents will only need to
-  directly install `mlagents`.
-  * [`mlagents_envs`](../ml-agents-envs/) contains a Python API to interact with a Unity
-  scene. It is a foundational layer that facilitates data messaging between Unity scene
-  and the Python machine learning algorithms. Consequently, `mlagents` depends on `mlagents_envs`.
-  * [`gym_unity`](../gym-unity/) provides a Python-wrapper for your Unity scene that
-  supports the OpenAI Gym interface.
-* Unity [Project](../Project/) that contains several
-[example environments](Learning-Environment-Examples.md) that highlight the various features
-of the toolkit to help you get started.
-Consequently, to install and use ML-Agents you will need to:
-* Install Unity (2018.4 or later)
-* Install Python (3.6.1 or higher)
-* Clone this repository
-* Install the `com.unity.ml-agents` Unity package
-* Install the `mlagents` Python package
+- Unity package ([`com.unity.ml-agents`](../com.unity.ml-agents/)) contains the
+  Unity C# SDK that will be integrated into your Unity scene.
+- Three Python packages:
+  - [`mlagents`](../ml-agents/) contains the machine learning algorithms that
+    enables you to train behaviors in your Unity scene. Most users of ML-Agents
+    will only need to directly install `mlagents`.
+  - [`mlagents_envs`](../ml-agents-envs/) contains a Python API to interact with
+    a Unity scene. It is a foundational layer that facilitates data messaging
+    between Unity scene and the Python machine learning algorithms.
+    Consequently, `mlagents` depends on `mlagents_envs`.
+  - [`gym_unity`](../gym-unity/) provides a Python-wrapper for your Unity scene
+    that supports the OpenAI Gym interface.
+- Unity [Project](../Project/) that contains several
+  [example environments](Learning-Environment-Examples.md) that highlight the
+  various features of the toolkit to help you get started.
+
+Consequently, to install and use the ML-Agents Toolkit you will need to:
+
+- Install Unity (2018.4 or later)
+- Install Python (3.6.1 or higher)
+- Clone this repository (Optional)
+- Install the `com.unity.ml-agents` Unity package
+- Install the `mlagents` Python package
-[Download](https://unity3d.com/get-unity/download) and install Unity. We strongly recommend
-that you install Unity through the Unity Hub as it will enable you to manage multiple Unity
-versions.
+[Download](https://unity3d.com/get-unity/download) and install Unity. We
+strongly recommend that you install Unity through the Unity Hub as it will
+enable you to manage multiple Unity versions.
-We recommend [installing](https://www.python.org/downloads/) Python 3.6 or 3.7. If your Python
-environment doesn't include `pip3`, see these
+We recommend [installing](https://www.python.org/downloads/) Python 3.6 or 3.7.
+If your Python environment doesn't include `pip3`, see these
-Although we do not provide support for Anaconda installation on Windows, the previous
+Although we do not provide support for Anaconda installation on Windows, the
+previous
-### Clone the ML-Agents Toolkit Repository
+### Clone the ML-Agents Toolkit Repository (Optional)
-Now that you have installed Unity and Python, you will need to clone the
-ML-Agents Toolkit GitHub repository to install the Unity package (the Python
-packages can be installed directly from PyPi - a Python package registry).
+Now that you have installed Unity and Python, you can now install the Unity and
+Python packages. You do not need to clone the repository to install those
+packages, but you may choose to clone the repository if you'd like download our
+example environments and training configurations to experiment with them (some
+of our tutorials / guides assume you have access to our example environments).
-The `--branch release_1` option will switch to the tag of the latest stable release.
-Omitting that will get the `master` branch which is potentially unstable.
+
+The `--branch release_1` option will switch to the tag of the latest stable
+release. Omitting that will get the `master` branch which is potentially
+unstable.
+
+#### Advanced: Local Installation for Development
+
+You will need to clone the repository if you plan to modify or extend the
+ML-Agents Toolkit for your purposes. If you plan to contribute those changes
+back, make sure to clone the `master` branch (by omitting `--branch release_1`
+from the command above). See our
+[Contributions Guidelines](../com.unity.ml-agents/CONTRIBUTING.md) for more
+information on contributing to the ML-Agents Toolkit.
-The Unity ML-Agents C# SDK is a Unity Package. You can install the `com.unity.ml-agents` package
-[directly from the Package Manager registry](https://docs.unity3d.com/Manual/upm-ui-install.html)
-without cloning this repository. Please make sure you enable 'Preview Packages'  in the 'Advanced'
-dropdown  in order to find it.
+The Unity ML-Agents C# SDK is a Unity Package. You can install the
+`com.unity.ml-agents` package
+[directly from the Package Manager registry](https://docs.unity3d.com/Manual/upm-ui-install.html).
+Please make sure you enable 'Preview Packages' in the 'Advanced' dropdown in
+order to find it.
+
+#### Advanced: Local Installation for Development
-If you want to try newer features from our development branch you
-can still [install the local package](https://docs.unity3d.com/Manual/upm-ui-local.html) from the
-repo that you just cloned.
+You can [add the local](https://docs.unity3d.com/Manual/upm-ui-local.html)
+`com.unity.ml-agents` package (from the repository that you just cloned) to our
+project by:
-Add the local `com.unity.ml-agents` package to your project by:
-1. navigating to the menu `Window`  -> `Package Manager`.
+1. navigating to the menu `Window` -> `Package Manager`.
-**NOTE:** In Unity 2018.4 the `+` button is on the bottom right of the packages list, and in Unity 2019.3 it's
-on the top left of the packages list.
+**NOTE:** In Unity 2018.4 the `+` button is on the bottom right of the packages
+list, and in Unity 2019.3 it's on the top left of the packages list.
-       height="340" border="10" />
+       height="300"
+       border="10" />
-     height="340" border="10" />
+     height="300"
+     border="10" />
-If you are going to follow the examples from our documentation, you can open the `Project`
-folder in Unity and start tinkering immediately.
+If you are going to follow the examples from our documentation, you can open the
+`Project` folder in Unity and start tinkering immediately.
-Installing the `mlagents` Python package involves installing other Python packages
-that `mlagents` depends on. So you may run into installation issues if your machine
-has older versions of any of those dependencies already installed. Consequently, our
-supported path for installing `mlagents` is to leverage Python Virtual Environments.
-Virtual Environments provide a mechanim for isolating the dependencies for each project
-and are supported on Mac / Windows / Linux. We offer a dedicated
-[guide on Virtual Environments](Using-Virtual-Environment.md).
+Installing the `mlagents` Python package involves installing other Python
+packages that `mlagents` depends on. So you may run into installation issues if
+your machine has older versions of any of those dependencies already installed.
+Consequently, our supported path for installing `mlagents` is to leverage Python
+Virtual Environments. Virtual Environments provide a mechanism for isolating the
+dependencies for each project and are supported on Mac / Windows / Linux. We
+offer a dedicated [guide on Virtual Environments](Using-Virtual-Environment.md).
-To install the `mlagents` Python package, activate your virtual environment and run from the
-command line:
+To install the `mlagents` Python package, activate your virtual environment and
+run from the command line:
-Note that this will install `mlagents` from PyPi, _not_ from the cloned repo.
-If you installed this correctly, you should be able to run
-`mlagents-learn --help`, after which you will see the Unity logo and the command line
-parameters you can use with `mlagents-learn`.
+Note that this will install `mlagents` from PyPi, _not_ from the cloned
+repository. If you installed this correctly, you should be able to run
+`mlagents-learn --help`, after which you will see the Unity logo and the command
+line parameters you can use with `mlagents-learn`.
-#### Advanced: Installing for Development
+#### Advanced: Local Installation for Development
-If you intend to make modifications to `mlagents` or `mlagents_envs`, you should install
-the packages from the cloned repo rather than from PyPi. To do this, you will need to install
- `mlagents` and `mlagents_envs` separately. From the repo's root directory, run:
+If you intend to make modifications to `mlagents` or `mlagents_envs`, you should
+install the packages from the cloned repository rather than from PyPi. To do
+this, you will need to install `mlagents` and `mlagents_envs` separately. From
+the repository's root directory, run:

 ```sh
 pip3 install -e ./ml-agents-envs
-Running pip with the `-e` flag will let you make changes to the Python files directly and have
-those reflected when you run `mlagents-learn`. It is important to install these packages in this
-order as the `mlagents` package depends on `mlagents_envs`, and installing it in the other
-order will download `mlagents_envs` from PyPi.
+Running pip with the `-e` flag will let you make changes to the Python files
+directly and have those reflected when you run `mlagents-learn`. It is important
+to install these packages in this order as the `mlagents` package depends on
+`mlagents_envs`, and installing it in the other order will download
+`mlagents_envs` from PyPi.
-The [Getting Started](Getting-Started.md) guide contains several short tutorials on
-setting up the ML-Agents Toolkit within Unity, running a pre-trained model, in
-addition to building and training environments.
+The [Getting Started](Getting-Started.md) guide contains several short tutorials
+on setting up the ML-Agents Toolkit within Unity, running a pre-trained model,
+in addition to building and training environments.

 ## Help

--- a/docs/Learning-Environment-Design-Agents.md
+++ b/docs/Learning-Environment-Design-Agents.md
 distinguish opposing agents, set the team ID to different integer values in the
 behavior parameters script on the agent prefab.

-![Team ID](images/team_id.png)
+<p align="center">
+  <img src="images/team_id.png"
+       alt="Team ID"
+       width="375" border="10" />
+</p>

 **_Team ID must be 0 or an integer greater than 0._**

 <p align="center">
  <img src="images/demo_component.png"
       alt="Demonstration Recorder"
-       width="375" border="10" />
+       width="450" border="10" />
 </p>

 When `Record` is checked, a demonstration will be created whenever the scene is
--- a/docs/Learning-Environment-Design.md
+++ b/docs/Learning-Environment-Design.md
-# Reinforcement Learning in Unity
+# Designing a Learning Environment
+
+This page contains general advice on how to design your learning environment, in
+addition to overviewing aspects of the ML-Agents Unity SDK that pertain to
+setting up your scene and simulation as opposed to designing your agents within
+the scene. We have a dedicated page on
+[Designing Agents](Learning-Environment-Design-Agents.md) which includes how to
+instrument observations, actions and rewards, define teams for multi-agent
+scenarios and record agent demonstrations for imitation learning.
+
+To help on-board to the entire set of functionality provided by the ML-Agents
+Toolkit, we recommend exploring our [API documentation](API-Reference.md).
+Additionally, our [example environments](Learning-Environment-Examples.md) are a
+great resource as they provide sample usage of almost all of our features.

 ## The Simulation and Training Process

 for each training episode. Otherwise, the agent would probably on learn to solve
 one, particular maze, not mazes in general.

-### Environment Parameters
+### Multiple Areas
+
+In many of the example environments, many copies of the training area are
+instantiated in the scene. This generally speeds up training, allowing the
+environment to gather many experiences in parallel. This can be achieved simply
+by instantiating many Agents with the same Behavior Name. If possible, consider
+designing your scene to support multiple areas.
+
+Check out our example environments to see examples of multiple areas.
+Additionally, the
+[Making a New Learning Environment](Learning-Environment-Create-New.md#optional-multiple-training-areas-within-the-same-scene)
+guide demonstrates this option.
+
+## Environments
+
+When you create a training environment in Unity, you must set up the scene so
+that it can be controlled by the external training process. Considerations
+include:
+
+- The training scene must start automatically when your Unity application is
+  launched by the training process.
+- The Academy must reset the scene to a valid starting point for each episode of
+  training.
+- A training episode must have a definite end — either using `Max Steps` or by
+  each Agent ending its episode manually with `EndEpisode()`.
+
+## Environment Parameters

 Curriculum learning and environment parameter randomization are two training
 methods that control specific parameters in your environment. As such, it is
 [WallJumpAgent.cs](../Project/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs)
 ).

-### Agent
+## Agent

 The Agent class represents an actor in the scene that collects observations and
 carries out actions. The Agent class is typically attached to the GameObject in

 See [Agents](Learning-Environment-Design-Agents.md) for detailed information
 about programming your own Agents.
-
-## Environments
-
-When you create a training environment in Unity, you must set up the scene so
-that it can be controlled by the external training process. Considerations
-include:
-
- The training scene must start automatically when your Unity application is
-  launched by the training process.
- The Academy must reset the scene to a valid starting point for each episode of
-  training.
- A training episode must have a definite end — either using `Max Steps` or by
-  each Agent ending its episode manually with `EndEpisode()`.

 ## Recording Statistics

--- a/docs/Learning-Environment-Examples.md
+++ b/docs/Learning-Environment-Examples.md
  - +0.1 Each step agent's hand is in goal location.
 - Behavior Parameters:
  - Vector Observation space: 26 variables corresponding to position, rotation,
-    velocity, and angular velocities of the two arm Rigidbodies.
+    velocity, and angular velocities of the two arm rigid bodies.
  - Vector Action space: (Continuous) Size of 4, corresponding to torque
    applicable to two joints.
  - Visual Observations: None.

 ![Worm](images/worm.png)

-* Set-up: A worm with a head and 3 body segments.
-* Goal: The agents must move its body toward the goal direction.
-  * `WormStaticTarget` - Goal direction is always forward.
-  * `WormDynamicTarget`- Goal direction is randomized.
-* Agents: The environment contains 10 agents with same Behavior Parameters.
-* Agent Reward Function (independent):
-  * +0.01 times body velocity in the goal direction.
-  * +0.01 times body direction alignment with goal direction.
-* Behavior Parameters:
-  * Vector Observation space: 57 variables corresponding to position, rotation,
+- Set-up: A worm with a head and 3 body segments.
+- Goal: The agents must move its body toward the goal direction.
+  - `WormStaticTarget` - Goal direction is always forward.
+  - `WormDynamicTarget`- Goal direction is randomized.
+- Agents: The environment contains 10 agents with same Behavior Parameters.
+- Agent Reward Function (independent):
+  - +0.01 times body velocity in the goal direction.
+  - +0.01 times body direction alignment with goal direction.
+- Behavior Parameters:
+  - Vector Observation space: 57 variables corresponding to position, rotation,
-  * Vector Action space: (Continuous) Size of 9, corresponding to target
+  - Vector Action space: (Continuous) Size of 9, corresponding to target
-  * Visual Observations: None
-* Float Properties: None
-* Benchmark Mean Reward for `WormStaticTarget`: 200
-* Benchmark Mean Reward for `WormDynamicTarget`: 150
+  - Visual Observations: None
+- Float Properties: None
+- Benchmark Mean Reward for `WormStaticTarget`: 200
+- Benchmark Mean Reward for `WormDynamicTarget`: 150

 ## Food Collector


 - Set-up: Environment where four agents compete in a 2 vs 2 toy soccer game.
 - Goal:
-  - Get the ball into the opponent's goal while preventing the ball from entering own goal.
- Agents: The environment contains four agents, with the same
-  Behavior Parameters : SoccerTwos.
+  - Get the ball into the opponent's goal while preventing the ball from
+    entering own goal.
+- Agents: The environment contains four agents, with the same Behavior
+  Parameters : SoccerTwos.
-    - (1 - `accumulated time penalty`) When ball enters opponent's goal `accumulated time penalty` is incremented by
-    (1 / `MaxStep`) every fixed update and is reset to 0 at the beginning of an episode.
-    - -1 When ball enters team's goal.
+  - (1 - `accumulated time penalty`) When ball enters opponent's goal
+    `accumulated time penalty` is incremented by (1 / `MaxStep`) every fixed
+    update and is reset to 0 at the beginning of an episode.
+  - -1 When ball enters team's goal.
-  - Vector Observation space: 336 corresponding to 11 ray-casts forward distributed over 120 degrees
-    and 3 ray-casts backward distributed over 90 degrees each detecting 6 possible object types, along with the object's distance.
-    The forward ray-casts contribute 264 state dimensions and backward 72 state dimensions over three observation stacks.
-  - Vector Action space: (Discrete) Three branched actions corresponding to forward, backward, sideways movement,
-      as well as rotation.
+  - Vector Observation space: 336 corresponding to 11 ray-casts forward
+    distributed over 120 degrees and 3 ray-casts backward distributed over 90
+    degrees each detecting 6 possible object types, along with the object's
+    distance. The forward ray-casts contribute 264 state dimensions and backward
+    72 state dimensions over three observation stacks.
+  - Vector Action space: (Discrete) Three branched actions corresponding to
+    forward, backward, sideways movement, as well as rotation.
-  - ball_scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
+  - ball_scale: Specifies the scale of the ball in the 3 dimensions (equal
+    across the three dimensions)
    - Default: 7.5
    - Recommended minimum: 4
    - Recommended maximum: 10
 - Agents: The environment contains three agents. Two Strikers and one Goalie.
  Behavior Parameters : Striker, Goalie.
 - Striker Agent Reward Function (dependent):
-    - +1 When ball enters opponent's goal.
-    - -0.001 Existential penalty.
+  - +1 When ball enters opponent's goal.
+  - -0.001 Existential penalty.
-    - -1 When ball enters goal.
-    - 0.001 Existential bonus.
+  - -1 When ball enters goal.
+  - 0.001 Existential bonus.
-  - Striker Vector Observation space: 294 corresponding to 11 ray-casts forward distributed over 120 degrees
-    and 3 ray-casts backward distributed over 90 degrees each detecting 5 possible object types, along with the object's distance.
-    The forward ray-casts contribute 231 state dimensions and backward 63 state dimensions over three observation stacks.
-  - Striker Vector Action space: (Discrete) Three branched actions corresponding to forward, backward, sideways movement,
-      as well as rotation.
-  - Goalie Vector Observation space: 738 corresponding to 41 ray-casts distributed over 360 degrees
-    each detecting 4 possible object types, along with the object's distance and 3 observation stacks.
-  - Goalie Vector Action space: (Discrete) Three branched actions corresponding to forward, backward, sideways movement,
-      as well as rotation.
+  - Striker Vector Observation space: 294 corresponding to 11 ray-casts forward
+    distributed over 120 degrees and 3 ray-casts backward distributed over 90
+    degrees each detecting 5 possible object types, along with the object's
+    distance. The forward ray-casts contribute 231 state dimensions and backward
+    63 state dimensions over three observation stacks.
+  - Striker Vector Action space: (Discrete) Three branched actions corresponding
+    to forward, backward, sideways movement, as well as rotation.
+  - Goalie Vector Observation space: 738 corresponding to 41 ray-casts
+    distributed over 360 degrees each detecting 4 possible object types, along
+    with the object's distance and 3 observation stacks.
+  - Goalie Vector Action space: (Discrete) Three branched actions corresponding
+    to forward, backward, sideways movement, as well as rotation.
-  - ball_scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
+  - ball_scale: Specifies the scale of the ball in the 3 dimensions (equal
+    across the three dimensions)
    - Default: 7.5
    - Recommended minimum: 4
    - Recommended maximum: 10
    - Recommended maximum: 20
-

 ## Walker

--- a/docs/Learning-Environment-Executable.md
+++ b/docs/Learning-Environment-Executable.md
 INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 10000. Mean Reward: 27.284. Std of Reward: 28.667. Training.
 ```

-You can press Ctrl+C to stop the training, and your trained model will be at
+You can press `Ctrl+C` to stop the training, and your trained model will be at
 `models/<run-identifier>/<behavior_name>.nn`, which corresponds to your model's
 latest checkpoint. (**Note:** There is a known bug on Windows that causes the
 saving of the model to fail when you early terminate the training, it's
--- a/docs/Limitations.md
+++ b/docs/Limitations.md

 See the package-specific Limitations pages:

- [Unity `com.unity.mlagents` package](../com.unity.ml-agents/Documentation~/com.unity.ml-agents.md)
- [`mlagents` Python package](../ml-agents/README.md)
- [`mlagents_envs` Python package](../ml-agents-envs/README.md)
- [`gym_unity` Python package](../gym-unity/README.md)
+- [`com.unity.mlagents` Unity package](../com.unity.ml-agents/Documentation~/com.unity.ml-agents.md#known-limitations)
+- [`mlagents` Python package](../ml-agents/README.md#limitations)
+- [`mlagents_envs` Python package](../ml-agents-envs/README.md#limitations)
+- [`gym_unity` Python package](../gym-unity/README.md#limitations)
--- a/docs/ML-Agents-Overview.md
+++ b/docs/ML-Agents-Overview.md
 - [Key Components](#key-components)
 - [Training Modes](#training-modes)
  - [Built-in Training and Inference](#built-in-training-and-inference)
+    - [Cross-Platform Inference](#cross-platform-inference)
  - [Custom Training and Inference](#custom-training-and-inference)
 - [Flexible Training Scenarios](#flexible-training-scenarios)
 - [Training Methods: Environment-agnostic](#training-methods-environment-agnostic)
 processes these observations and sends back actions for each medic to take.
 During training these actions are mostly exploratory to help the Python API
 learn the best policy for each medic. Once training concludes, the learned
-policy for each medic can be exported. Given that all our implementations are
-based on TensorFlow, the learned policy is just a TensorFlow model file. Then
-during the inference phase, we use the TensorFlow model generated from the
-training phase. Now during the inference phase, the medics still continue to
-generate their observations, but instead of being sent to the Python API, they
-will be fed into their (internal, embedded) model to generate the _optimal_
-action for each medic to take at every point in time.
-
-To summarize: our built-in implementations are based on TensorFlow, thus, during
-training the Python API uses the observations it receives to learn a TensorFlow
-model. This model is then embedded within the Agent during inference.
+policy for each medic can be exported as a model file. Then during the inference
+phase, the medics still continue to generate their observations, but instead of
+being sent to the Python API, they will be fed into their (internal, embedded)
+model to generate the _optimal_ action for each medic to take at every point in
+time.
+#### Cross-Platform Inference
+
+It is important to note that the ML-Agents Toolkit leverages the
+[Unity Inference Engine](Unity-Inference-Engine.md) to run the models within a
+Unity scene such that an agent can take the _optimal_ action at each step. Given
+that the Unity Inference Engine support most platforms that Unity does, this
+means that any model you train with the ML-Agents Toolkit can be embedded into
+your Unity application that runs on any platform. See our
+[dedicated blog post](https://blogs.unity3d.com/2019/03/01/unity-ml-agents-toolkit-v0-7-a-leap-towards-cross-platform-inference/)
+for additional information.
+
 ### Custom Training and Inference

 In the previous mode, the Agents were used for training to generate a TensorFlow
 and saved as assets. These demonstrations contain information on the
 observations, actions, and rewards for a given agent during the recording
 session. They can be managed in the Editor, as well as used for training with BC
-and GAIL.
+and GAIL. See the
+[Designing Agents](Learning-Environment-Design-Agents.md#recording-demonstrations)
+page for more information on how to record demonstrations for your agent.

 ### Summary

 experience replay mechanism used by SAC. Thus, we recommend that users use PPO.
 For further reading on this issue in particular, see the paper
 [Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning](https://arxiv.org/pdf/1702.08887.pdf).
+
+See our
+[Designing Agents](Learning-Environment-Design-Agents.md#defining-teams-for-multi-agent-scenarios)
+page for more information on setting up teams in your Unity scene. Also, read
+our
+[blog post on self-play](https://blogs.unity3d.com/2020/02/28/training-intelligent-adversaries-using-self-play-with-ml-agents/)
+for additional information.

 ### Solving Complex Tasks using Curriculum Learning

 is not possible to directly replicate the results here using that environment.]_

 The ML-Agents Toolkit supports modifying custom environment parameters during
-the training process to aid in learning.. This allows elements of the
-environment related to difficulty or complexity to be dynamically adjusted based
-on training progress.
+the training process to aid in learning. This allows elements of the environment
+related to difficulty or complexity to be dynamically adjusted based on training
+progress. The [Training ML-Agents](Training-ML-Agents.md#curriculum-learning)
+page has more information on defining training curriculums.

 ### Training Robust Agents using Environment Parameter Randomization

 learns, the ML-Agents Toolkit provides a way to randomly sample parameters of
 the environment during training. We refer to this approach as **Environment
 Parameter Randomization**. For those familiar with Reinforcement Learning
-research, this approach is based on the concept of Domain Randomization (you can
-read more about it [here](https://arxiv.org/abs/1703.06907)). By using parameter
-randomization during training, the agent can be better suited to adapt (with
-higher performance) to future unseen variations of the environment.
-
-_Example of variations of the 3D Ball environment._
+research, this approach is based on the concept of
+[Domain Randomization](https://arxiv.org/abs/1703.06907). By using
+[parameter randomization during training](Training-ML-Agents.md#environment-parameter-randomization),
+the agent can be better suited to adapt (with higher performance) to future
+unseen variations of the environment.
-In the 3D ball environment example displayed in the figure above, the
-environment parameters are `gravity`, `ball_mass` and `ball_scale`.
+_Example of variations of the 3D Ball environment. The environment parameters
+are `gravity`, `ball_mass` and `ball_scale`._

 ## Model Types


 - **Concurrent Unity Instances** - We enable developers to run concurrent,
  parallel instances of the Unity executable during training. For certain
-  scenarios, this should speed up training.
- **Recording Statistics from Unity** - We enable developers to record
-  statistics from within their Unity environments. These statistics are
-  aggregated and generated during the training process.
- **Custom Side Channels** - We enable developers to create custom side channels
-  to manage data transfer between Unity and Python that is unique to their
-  training workflow and/or environment.
- **Custom Samplers** - We enable developers to create custom sampling methods
+  scenarios, this should speed up training. Check out our dedicated page on
+  [creating a Unity executable](Learning-Environment-Executable.md) and the
+  [Training ML-Agents](Training-ML-Agents.md#training-using-concurrent-unity-instances)
+  page for instructions on how to set the number of concurrent instances.
+- **Recording Statistics from Unity** - We enable developers to
+  [record statistics](Learning-Environment-Design.md#recording-statistics) from
+  within their Unity environments. These statistics are aggregated and generated
+  during the training process.
+- **Custom Side Channels** - We enable developers to
+  [create custom side channels](Custom-SideChannels.md) to manage data transfer
+  between Unity and Python that is unique to their training workflow and/or
+  environment.
+- **Custom Samplers** - We enable developers to
+  [create custom sampling methods](Training-ML-Agents.md#defining-a-new-sampler-type)
  for Environment Parameter Randomization. This enables users to customize this
  training method for their particular environment.

 packed with several features to enable researchers and developers to leverage
 (and enhance) machine learning within Unity.

-To help you use ML-Agents, we've created several in-depth tutorials for
-[installing ML-Agents](Installation.md), [getting started](Getting-Started.md)
-with the 3D Balance Ball environment (one of our many
-[sample environments](Learning-Environment-Examples.md)) and
-[making your own environment](Learning-Environment-Create-New.md).
+In terms of next steps:
+
+- For a walkthrough of running ML-Agents with a simple scene, check out the
+  [Getting Started](Getting-Started.md) guide.
+- For a "Hello World" introduction to creating your own Learning Environment,
+  check out the
+  [Making a New Learning Environment](Learning-Environment-Create-New.md) page.
+- For an overview on the more complex example environments that are provided in
+  this toolkit, check out the
+  [Example Environments](Learning-Environment-Examples.md) page.
+- For more information on the various training options available, check out the
+  [Training ML-Agents](Training-ML-Agents.md) page.
--- a/docs/Profiling-Python.md
+++ b/docs/Profiling-Python.md
 # Profiling in Python

-As part of the ML-Agents tookit, we provide a lightweight profiling system,
-in order to identity hotspots in the training process and help spot regressions from changes.
+As part of the ML-Agents Tookit, we provide a lightweight profiling system, in
+order to identity hotspots in the training process and help spot regressions
+from changes.
-Timers are hierarchical, meaning that the time tracked in a block of code can be further split into other blocks if
-desired. This also means that a function that is called from multiple places in the code will appear in multiple
-places in the timing output.
+Timers are hierarchical, meaning that the time tracked in a block of code can be
+further split into other blocks if desired. This also means that a function that
+is called from multiple places in the code will appear in multiple places in the
+timing output.
-All timers operate using a "global" instance by default, but this can be overridden if necessary (mainly for testing).
+All timers operate using a "global" instance by default, but this can be
+overridden if necessary (mainly for testing).
-There are two ways to indicate code should be included in profiling. The simplest way is to add the `@timed`
-decorator to a function or method of interested.
+There are two ways to indicate code should be included in profiling. The
+simplest way is to add the `@timed` decorator to a function or method of
+interested.

 ```python
 class TrainerController:

 You can also used the `hierarchical_timer` context manager.

-``` python
+```python
-The context manager may be easier than the `@timed` decorator for profiling different parts of a large function, or
-profiling calls to abstract methods that might not use decorator.
+The context manager may be easier than the `@timed` decorator for profiling
+different parts of a large function, or profiling calls to abstract methods that
+might not use decorator.
-By default, at the end of training, timers are collected and written in json format to
-`{summaries_dir}/{run_id}_timers.json`. The output consists of node objects with the following keys:
- * total (float): The total time in seconds spent in the block, including child calls.
- * count (int): The number of times the block was called.
- * self (float): The total time in seconds spent in the block, excluding child calls.
- * children (dictionary): A dictionary of child nodes, keyed by the node name.
- * is_parallel (bool): Indicates that the block of code was executed in multiple threads or processes (see below). This
- is optional and defaults to false.
+
+By default, at the end of training, timers are collected and written in json
+format to `{summaries_dir}/{run_id}_timers.json`. The output consists of node
+objects with the following keys:
+
+- total (float): The total time in seconds spent in the block, including child
+  calls.
+- count (int): The number of times the block was called.
+- self (float): The total time in seconds spent in the block, excluding child
+  calls.
+- children (dictionary): A dictionary of child nodes, keyed by the node name.
+- is_parallel (bool): Indicates that the block of code was executed in multiple
+  threads or processes (see below). This is optional and defaults to false.
+
-For code that executes in multiple processes (for example, SubprocessEnvManager), we periodically send the timer
-information back to the "main" process, aggregate the timers there, and flush them in the subprocess. Note that
-(depending on the number of processes) this can result in timers where the total time may exceed the parent's total
-time. This is analogous to the difference between "real" and "user" values reported from the unix `time` command. In the
-timer output, blocks that were run in parallel are indicated by the `is_parallel` flag.
+
+For code that executes in multiple processes (for example,
+SubprocessEnvManager), we periodically send the timer information back to the
+"main" process, aggregate the timers there, and flush them in the subprocess.
+Note that (depending on the number of processes) this can result in timers where
+the total time may exceed the parent's total time. This is analogous to the
+difference between "real" and "user" values reported from the unix `time`
+command. In the timer output, blocks that were run in parallel are indicated by
+the `is_parallel` flag.
-Timers currently use `time.perf_counter()` to track time spent, which may not give accurate results for multiple
-threads. If this is problematic, set `threaded: false` in your trainer configuration.
+Timers currently use `time.perf_counter()` to track time spent, which may not
+give accurate results for multiple threads. If this is problematic, set
+`threaded: false` in your trainer configuration.
--- a/docs/Readme.md
+++ b/docs/Readme.md

 - [Making a New Learning Environment](Learning-Environment-Create-New.md)
 - [Designing a Learning Environment](Learning-Environment-Design.md)
- [Designing Agents](Learning-Environment-Design-Agents.md)
-
-### Advanced Usage
-
+  - [Designing Agents](Learning-Environment-Design-Agents.md)
-## Training
+## Training & Inference
+  - [Training Configuration File](Training-Configuration-File.md)
+  - [Using TensorBoard to Observe Training](Using-Tensorboard.md)
- [Using TensorBoard to Observe Training](Using-Tensorboard.md)
-
-## Inference
-
+- [Creating Custom Samplers for Environment Parameter Randomization](Training-ML-Agents.md#defining-a-new-sampler-type)

 ## Help

--- a/docs/Unity-Inference-Engine.md
+++ b/docs/Unity-Inference-Engine.md
 # Unity Inference Engine

-The ML-Agents Toolkit allows you to use pre-trained neural network models
-inside your Unity games. This support is possible thanks to the Unity Inference
-Engine. The Unity Inference Engine is using
-[compute shaders](https://docs.unity3d.com/Manual/class-ComputeShader.html)
-to run the neural network within Unity.
+The ML-Agents Toolkit allows you to use pre-trained neural network models inside
+your Unity games. This support is possible thanks to the
+[Unity Inference Engine](https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html)
+(codenamed Barracuda). The Unity Inference Engine uses
+[compute shaders](https://docs.unity3d.com/Manual/class-ComputeShader.html) to
+run the neural network within Unity.
-__Note__: The ML-Agents Toolkit only supports the models created with our
+**Note**: The ML-Agents Toolkit only supports the models created with our
-Scripting Backends : The Unity Inference Engine is generally faster with
-__IL2CPP__ than with __Mono__ for Standalone builds.
-In the Editor, It is not possible to use the Unity Inference Engine with
-GPU device selected when Editor Graphics Emulation is set to __OpenGL(ES)
-3.0 or 2.0 emulation__. Also there might be non-fatal build time errors
-when target platform includes Graphics API that does not support
-__Unity Compute Shaders__.
-The Unity Inference Engine supposedly works on any Unity supported platform
-but we only tested for the following platforms :
+See the Unity Inference Engine documentation for a list of the
+[supported platforms](https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html#supported-platforms).
-* Linux 64 bits
-* Mac OS X 64 bits (`OpenGLCore` Graphics API is not supported)
-* Windows 64 bits
-* iOS
-* Android
+Scripting Backends : The Unity Inference Engine is generally faster with
+**IL2CPP** than with **Mono** for Standalone builds. In the Editor, It is not
+possible to use the Unity Inference Engine with GPU device selected when Editor
+Graphics Emulation is set to **OpenGL(ES) 3.0 or 2.0 emulation**. Also there
+might be non-fatal build time errors when target platform includes Graphics API
+that does not support **Unity Compute Shaders**.
+
- * Barracuda (`.nn`) files use a proprietary format produced by the [`tensorflow_to_barracuda.py`]() script.
- * ONNX (`.onnx`) files use an [industry-standard open format](https://onnx.ai/about.html) produced by the [tf2onnx package](https://github.com/onnx/tensorflow-onnx).
-Export to ONNX is currently considered beta. To enable it, make sure `tf2onnx>=1.5.5` is installed in pip.
-tf2onnx does not currently support tensorflow 2.0.0 or later, or earlier than 1.12.0.
+- Barracuda (`.nn`) files use a proprietary format produced by the
+  [`tensorflow_to_barracuda.py`]() script.
+- ONNX (`.onnx`) files use an
+  [industry-standard open format](https://onnx.ai/about.html) produced by the
+  [tf2onnx package](https://github.com/onnx/tensorflow-onnx).
+
+Export to ONNX is currently considered beta. To enable it, make sure
+`tf2onnx>=1.5.5` is installed in pip. tf2onnx does not currently support
+tensorflow 2.0.0 or later, or earlier than 1.12.0.
-When using a model, drag the model file into the **Model** field in the Inspector of the Agent.
-Select the **Inference Device** : CPU or GPU you want to use for Inference.
+When using a model, drag the model file into the **Model** field in the
+Inspector of the Agent. Select the **Inference Device** : CPU or GPU you want to
+use for Inference.
-**Note:** For most of the models generated with the ML-Agents Toolkit, CPU will be faster than GPU.
-You should use the GPU only if you use the
-ResNet visual encoder or have a large number of agents with visual observations.
+**Note:** For most of the models generated with the ML-Agents Toolkit, CPU will
+be faster than GPU. You should use the GPU only if you use the ResNet visual
+encoder or have a large number of agents with visual observations.
--- a/gym-unity/README.md
+++ b/gym-unity/README.md
 information on the gym interface, see [here](https://github.com/openai/gym).

 We provide a gym wrapper and instructions for using it with existing machine
-learning algorithms which utilize gym. Our wrapper provides interfaces on top
-of our `UnityEnvironment` class, which is the default way of interfacing with a
+learning algorithms which utilize gym. Our wrapper provides interfaces on top of
+our `UnityEnvironment` class, which is the default way of interfacing with a
 Unity environment via Python.

 ## Installation
 ```sh
-pip install gym_unity
+pip3 install gym_unity
-pip install -e .
+pip3 install -e .
 ```

 ## Using the Gym Wrapper
 env = UnityToGymWrapper(unity_environment, worker_id, use_visual, uint8_visual)
 ```

-*  `unity_environment` refers to the Unity environment to be wrapped.
+- `unity_environment` refers to the Unity environment to be wrapped.
-*  `use_visual` refers to whether to use visual observations (True) or vector
-   observations (False) as the default observation provided by the `reset` and
-   `step` functions. Defaults to `False`.
+- `use_visual` refers to whether to use visual observations (True) or vector
+  observations (False) as the default observation provided by the `reset` and
+  `step` functions. Defaults to `False`.
-*  `uint8_visual` refers to whether to output visual observations as `uint8` values
-   (0-255). Many common Gym environments (e.g. Atari) do this. By default they
-   will be floats (0.0-1.0). Defaults to `False`.
+- `uint8_visual` refers to whether to output visual observations as `uint8`
+  values (0-255). Many common Gym environments (e.g. Atari) do this. By default
+  they will be floats (0.0-1.0). Defaults to `False`.
-*  `flatten_branched` will flatten a branched discrete action space into a Gym Discrete.
-   Otherwise, it will be converted into a MultiDiscrete. Defaults to `False`.
+- `flatten_branched` will flatten a branched discrete action space into a Gym
+  Discrete. Otherwise, it will be converted into a MultiDiscrete. Defaults to
+  `False`.
-*  `allow_multiple_visual_obs` will return a list of observation instead of only
-   one if disabled. Defaults to `False`.
+- `allow_multiple_visual_obs` will return a list of observation instead of only
+  one if disabled. Defaults to `False`.
-
-* It is only possible to use an environment with a **single** Agent.
-* By default, the first visual observation is provided as the `observation`, if
-  present. Otherwise, vector observations are provided. You can receive all visual
-  observations by using the `allow_multiple_visual_obs=True` option in the gym
-  parameters. If set to `True`, you will receive a list of `observation` instead
-  of only the first one.
-* The `TerminalSteps` or `DecisionSteps` output from the environment can still be
-accessed from the `info` provided by `env.step(action)`.
-* Stacked vector observations are not supported.
-* Environment registration for use with `gym.make()` is currently not supported.
+- It is only possible to use an environment with a **single** Agent.
+- By default, the first visual observation is provided as the `observation`, if
+  present. Otherwise, vector observations are provided. You can receive all
+  visual observations by using the `allow_multiple_visual_obs=True` option in
+  the gym parameters. If set to `True`, you will receive a list of `observation`
+  instead of only the first one.
+- The `TerminalSteps` or `DecisionSteps` output from the environment can still
+  be accessed from the `info` provided by `env.step(action)`.
+- Stacked vector observations are not supported.
+- Environment registration for use with `gym.make()` is currently not supported.

 ## Running OpenAI Baselines Algorithms

 ### Other Algorithms

 Other algorithms in the Baselines repository can be run using scripts similar to
-the examples from the baselines package. In most cases, the primary changes needed
-to use a Unity environment are to import `UnityToGymWrapper`, and to replace the
-environment creation code, typically `gym.make()`, with a call to
+the examples from the baselines package. In most cases, the primary changes
+needed to use a Unity environment are to import `UnityToGymWrapper`, and to
+replace the environment creation code, typically `gym.make()`, with a call to
 `UnityToGymWrapper(unity_environment)` passing the environment as input.

 A typical rule of thumb is that for vision-based environments, modification
-Some algorithms will make use of `make_env()` or `make_mujoco_env()`
-functions. You can define a similar function for Unity environments.  An example of
-such a method using the PPO2 baseline:
+Some algorithms will make use of `make_env()` or `make_mujoco_env()` functions.
+You can define a similar function for Unity environments. An example of such a
+method using the PPO2 baseline:

 ```python
 from mlagents_envs.environment import UnityEnvironment
 ## Run Google Dopamine Algorithms

 Google provides a framework [Dopamine](https://github.com/google/dopamine), and
-implementations of algorithms, e.g. DQN, Rainbow, and the C51 variant of Rainbow.
-Using the Gym wrapper, we can run Unity environments using Dopamine.
+implementations of algorithms, e.g. DQN, Rainbow, and the C51 variant of
+Rainbow. Using the Gym wrapper, we can run Unity environments using Dopamine.

 First, after installing the Gym wrapper, clone the Dopamine repository.


 Then, follow the appropriate install instructions as specified on
-[Dopamine's homepage](https://github.com/google/dopamine). Note that the Dopamine
-guide specifies using a virtualenv. If you choose to do so, make sure your unity_env
-package is also installed within the same virtualenv as Dopamine.
+[Dopamine's homepage](https://github.com/google/dopamine). Note that the
+Dopamine guide specifies using a virtualenv. If you choose to do so, make sure
+your unity_env package is also installed within the same virtualenv as Dopamine.
-First, open `dopamine/atari/run_experiment.py`. Alternatively, copy the entire `atari`
-folder, and name it something else (e.g. `unity`). If you choose the copy approach,
-be sure to change the package names in the import statements in `train.py` to your new
-directory.
+First, open `dopamine/atari/run_experiment.py`. Alternatively, copy the entire
+`atari` folder, and name it something else (e.g. `unity`). If you choose the
+copy approach, be sure to change the package names in the import statements in
+`train.py` to your new directory.

 Within `run_experiment.py`, we will need to make changes to which environment is
 instantiated, just as in the Baselines example. At the top of the file, insert
 from gym_unity.envs import UnityToGymWrapper
 ```

-to import the Gym Wrapper. Navigate to the `create_atari_environment` method
-in the same file, and switch to instantiating a Unity environment by replacing
-the method with the following code.
+to import the Gym Wrapper. Navigate to the `create_atari_environment` method in
+the same file, and switch to instantiating a Unity environment by replacing the
+method with the following code.

 ```python
    game_version = 'v0' if sticky_actions else 'v4'
    return env
 ```

-`./envs/GridWorld` is the path to your built Unity executable. For more information on
-building Unity environments, see [here](../docs/Learning-Environment-Executable.md), and note
-the Limitations section below.
+`./envs/GridWorld` is the path to your built Unity executable. For more
+information on building Unity environments, see
+[here](../docs/Learning-Environment-Executable.md), and note the Limitations
+section below.
-Note that we are not using the preprocessor from Dopamine,
-as it uses many Atari-specific calls. Furthermore, frame-skipping can be done from within Unity,
+Note that we are not using the preprocessor from Dopamine, as it uses many
+Atari-specific calls. Furthermore, frame-skipping can be done from within Unity,
-Since Dopamine is designed around variants of DQN, it is only compatible
-with discrete action spaces, and specifically the Discrete Gym space. For environments
-that use branched discrete action spaces (e.g.
+Since Dopamine is designed around variants of DQN, it is only compatible with
+discrete action spaces, and specifically the Discrete Gym space. For
+environments that use branched discrete action spaces (e.g.
-`flatten_branched` parameter in `UnityToGymWrapper`, which treats each combination of branched
-actions as separate actions.
+`flatten_branched` parameter in `UnityToGymWrapper`, which treats each
+combination of branched actions as separate actions.
-Furthermore, when building your environments, ensure that your Agent is using visual
-observations with greyscale enabled, and that the dimensions of the visual observations
-is 84 by 84 (matches the parameter found in `dqn_agent.py` and `rainbow_agent.py`).
-Dopamine's agents currently do not automatically adapt to the observation
-dimensions or number of channels.
+Furthermore, when building your environments, ensure that your Agent is using
+visual observations with greyscale enabled, and that the dimensions of the
+visual observations is 84 by 84 (matches the parameter found in `dqn_agent.py`
+and `rainbow_agent.py`). Dopamine's agents currently do not automatically adapt
+to the observation dimensions or number of channels.
-The hyperparameters provided by Dopamine are tailored to the Atari games, and you will
-likely need to adjust them for ML-Agents environments. Here is a sample
+The hyperparameters provided by Dopamine are tailored to the Atari games, and
+you will likely need to adjust them for ML-Agents environments. Here is a sample
 `dopamine/agents/rainbow/configs/rainbow.gin` file that is known to work with
 GridWorld.


 This example assumed you copied `atari` to a separate folder named `unity`.
 Replace `unity` in `import dopamine.unity.run_experiment` with the folder you
-copied your `run_experiment.py` and `trainer.py` files to.
-If you directly modified the existing files, then use `atari` here.
+copied your `run_experiment.py` and `trainer.py` files to. If you directly
+modified the existing files, then use `atari` here.

 ### Starting a Run

  --gin_files='dopamine/agents/rainbow/configs/rainbow.gin'
 ```

-Again, we assume that you've copied `atari` into a separate folder.
-Remember to replace `unity` with the directory you copied your files into. If you
-edited the Atari files directly, this should be `atari`.
+Again, we assume that you've copied `atari` into a separate folder. Remember to
+replace `unity` with the directory you copied your files into. If you edited the
+Atari files directly, this should be `atari`.
-Dopamine as run on the GridWorld example environment. All Dopamine (DQN, Rainbow,
-C51) runs were done with the same epsilon, epsilon decay, replay history, training steps,
-and buffer settings as specified above. Note that the first 20000 steps are used to pre-fill
-the training buffer, and no learning happens.
+Dopamine as run on the GridWorld example environment. All Dopamine (DQN,
+Rainbow, C51) runs were done with the same epsilon, epsilon decay, replay
+history, training steps, and buffer settings as specified above. Note that the
+first 20000 steps are used to pre-fill the training buffer, and no learning
+happens.
-We provide results from our PPO implementation and the DQN from Baselines as reference.
-Note that all runs used the same greyscale GridWorld as Dopamine. For PPO, `num_layers`
-was set to 2, and all other hyperparameters are the default for GridWorld in `trainer_config.yaml`.
-For Baselines DQN, the provided hyperparameters in the previous section are used. Note
-that Baselines implements certain features (e.g. dueling-Q) that are not enabled
-in Dopamine DQN.
+We provide results from our PPO implementation and the DQN from Baselines as
+reference. Note that all runs used the same greyscale GridWorld as Dopamine. For
+PPO, `num_layers` was set to 2, and all other hyperparameters are the default
+for GridWorld in `trainer_config.yaml`. For Baselines DQN, the provided
+hyperparameters in the previous section are used. Note that Baselines implements
+certain features (e.g. dueling-Q) that are not enabled in Dopamine DQN.

 ![Dopamine on GridWorld](images/dopamine_gridworld_plot.png)

-algorithm to train on the VisualBanana environment, and provide the results below.
-The same hyperparameters were used as in the GridWorld case, except that
+algorithm to train on the VisualBanana environment, and provide the results
+below. The same hyperparameters were used as in the GridWorld case, except that
 `replay_history` and `epsilon_decay` were increased to 100000.

 ![Dopamine on VisualBanana](images/dopamine_visualbanana_plot.png)
--- a/ml-agents-envs/README.md
+++ b/ml-agents-envs/README.md

 The `mlagents_envs` Python package is part of the
 [ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents).
-`mlagents_envs` provides a Python API that allows direct interaction with the Unity
-game engine. It is used by the trainer implementation in `mlagents` as well as
-the `gym-unity` package to perform reinforcement learning within Unity. `mlagents_envs` can be
-used independently of `mlagents` for Python communication.
-
-The `mlagents_envs` Python package contains a low level API which allows you to interact
-directly with a Unity Environment. See [here](../docs/Python-API.md) for more information
-on using this API.
+`mlagents_envs` provides a Python API that allows direct interaction with the
+Unity game engine. It is used by the trainer implementation in `mlagents` as
+well as the `gym-unity` package to perform reinforcement learning within Unity.
+`mlagents_envs` can be used independently of `mlagents` for Python
+communication.

 ## Installation

-pip install mlagents_envs
+pip3 install mlagents_envs
-For more detailed documentation, check out the
-[ML-Agents Toolkit documentation.](../docs/Readme.md)
+See the [Python API Guide](../docs/Python-API.md) for more information on how to
+use the API to interact with a Unity environment.
+
+For more information on the ML-Agents Toolkit and how to instrument a Unity
+scene with the ML-Agents SDK, check out the main
+[ML-Agents Toolkit documentation](../docs/Readme.md).
- - `mlagents_envs` uses localhost ports to exchange data between Unity and Python. As such,
- multiple instances can have their ports collide leading to errors. Make sure to use a
- different port if you are using multiple instances of `UnityEnvironment`.
- - Communication between Unity and the Python `UnityEnvironment` is not secure.
- - On Linux, ports are not released immediately after the communication closes. As such, you
- cannot reuse ports right after closing a `UnityEnvironment`.
+
+- `mlagents_envs` uses localhost ports to exchange data between Unity and
+  Python. As such, multiple instances can have their ports collide, leading to
+  errors. Make sure to use a different port if you are using multiple instances
+  of `UnityEnvironment`.
+- Communication between Unity and the Python `UnityEnvironment` is not secure.
+- On Linux, ports are not released immediately after the communication closes.
+  As such, you cannot reuse ports right after closing a `UnityEnvironment`.
--- a/ml-agents/README.md
+++ b/ml-agents/README.md
 # Unity ML-Agents Trainers

 The `mlagents` Python package is part of the
-[ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents).
-`mlagents` provides a set of reinforcement and imitation learning algorithms designed to be
-used with Unity environments.  The algorithms interface with the Python API provided by the `mlagents_envs`
-package. See [here](../docs/Python-API.md) for more information on `mlagents_envs`.
+[ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents). `mlagents`
+provides a set of reinforcement and imitation learning algorithms designed to be
+used with Unity environments. The algorithms interface with the Python API
+provided by the `mlagents_envs` package. See [here](../docs/Python-API.md) for
+more information on `mlagents_envs`.
-The algorithms can be accessed using the: `mlagents-learn` access
-point. See
-[here](../docs/Training-ML-Agents.md)
-for more information on using this package.
+The algorithms can be accessed using the: `mlagents-learn` access point. See
+[here](../docs/Training-ML-Agents.md) for more information on using this
+package.

 ## Installation

-pip install mlagents
+pip3 install mlagents
-For more detailed documentation, check out the
-[ML-Agents Toolkit documentation.](../docs/Readme.md)
+For more information on the ML-Agents Toolkit and how to instrument a Unity
+scene with the ML-Agents SDK, check out the main
+[ML-Agents Toolkit documentation](../docs/Readme.md).
- - `mlagents` does not yet explicitly support multi-agent scenarios so training
- cooperative behavior among different agents is not stable.
- - Resuming self-play from a checkpoint resets the reported ELO to the default value.
- - Resuming curriculum learning from a checkpoint requires the last lesson be specified
- using the `--lesson` CLI option
+
+- `mlagents` does not yet explicitly support multi-agent scenarios so training
+  cooperative behavior among different agents is not stable.
+- Resuming self-play from a checkpoint resets the reported ELO to the default
+  value.
+- Resuming curriculum learning from a checkpoint requires the last lesson be
+  specified using the `--lesson` CLI option