Merge pull request #2515 from Unity-Technologies/ab/docs-updates

Changes to documentation
5 年前 · b444c1a5
--- a/docs/Installation.md
+++ b/docs/Installation.md
       width="500" border="10" />
 </p>

-## Windows Users
-For setting up your environment on Windows, we have created a [detailed
-guide](Installation-Windows.md) to setting up your env. For Mac and Linux,
-continue with this guide.
-
-## Mac and Unix Users
+## Environment Setup
+We now support a single mechanism for installing ML-Agents on Mac/Windows/Linux using Virtual
+Environments. For more information on Virtual Environments and installation instructions, 
+follow this [guide](Using-Virtual-Environment.md).

 ### Clone the ML-Agents Toolkit Repository

 Running pip with the `-e` flag will let you make changes to the Python files directly and have those
 reflected when you run `mlagents-learn`. It is important to install these packages in this order as the
 `mlagents` package depends on `mlagents_envs`, and installing it in the other 
-order will download `mlagents_envs` from PyPi. 
-
-## Docker-based Installation
-
-If you'd like to use Docker for ML-Agents, please follow
-[this guide](Using-Docker.md).
+order will download `mlagents_envs` from PyPi.

 ## Next Steps

--- a/docs/ML-Agents-Overview.md
+++ b/docs/ML-Agents-Overview.md
  [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
  to learn more about this feature.

- **Docker Set-up (Experimental)** - To facilitate setting up ML-Agents without
-  installing Python or TensorFlow directly, we provide a
-  [guide](Using-Docker.md) on how to create and run a Docker container.
+- **Broadcasting** - As discussed earlier, a Learning Brain sends the
+  observations for all its Agents to the Python API when dragged into the
+  Academy's `Broadcast Hub` with the `Control` checkbox checked. This is helpful
+  for training and later inference. Broadcasting is a feature which can be
+  enabled all types of Brains (Player, Learning, Heuristic) where the Agent
+  observations and actions are also sent to the Python API (despite the fact
+  that the Agent is **not** controlled by the Python API). This feature is
+  leveraged by Imitation Learning, where the observations and actions for a
+  Player Brain are used to learn the policies of an agent through demonstration.
+  However, this could also be helpful for the Heuristic and Learning Brains,
+  particularly when debugging agent behaviors. You can learn more about using
+  the broadcasting feature
+  [here](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).

 - **Cloud Training on AWS** - To facilitate using the ML-Agents toolkit on
  Amazon Web Services (AWS) machines, we provide a
--- a/docs/Readme.md
+++ b/docs/Readme.md

 * [Installation](Installation.md)
  * [Background: Jupyter Notebooks](Background-Jupyter.md)
-  * [Docker Set-up](Using-Docker.md)
+  * [Using Virtual Environment](Using-Virtual-Environment.md)
 * [Basic Guide](Basic-Guide.md)

 ## Getting Started
    [Heuristic](Learning-Environment-Design-Heuristic-Brains.md),
    [Learning](Learning-Environment-Design-Learning-Brains.md)
 * [Learning Environment Best Practices](Learning-Environment-Best-Practices.md)
-* [Using the Monitor](Feature-Monitor.md)
-* [Using the Video Recorder](https://github.com/Unity-Technologies/video-recorder)
-* [Using an Executable Environment](Learning-Environment-Executable.md)
-* [Creating Custom Protobuf Messages](Creating-Custom-Protobuf-Messages.md)
+
+### Advanced Usage
+  * [Using the Monitor](Feature-Monitor.md)
+  * [Using the Video Recorder](https://github.com/Unity-Technologies/video-recorder)
+  * [Using an Executable Environment](Learning-Environment-Executable.md)
+  * [Creating Custom Protobuf Messages](Creating-Custom-Protobuf-Messages.md)
+* [Using TensorBoard to Observe Training](Using-Tensorboard.md)
+* [Training Using Concurrent Unity Instances](Training-Using-Concurrent-Unity-Instances.md)
+
+### Advanced Training Methods
+
+
+### Cloud Training (Deprecated)
+Here are the cloud training set-up guides for Azure and AWS. We no longer use them ourselves and 
+so they may not be work correctly. We've decided to keep them up just in case they are helpful to
+you.
+
-* [Training Using Concurrent Unity Instances](Training-Using-Concurrent-Unity-Instances.md)
-* [Using TensorBoard to Observe Training](Using-Tensorboard.md)

 ## Inference

--- a/docs/Training-ML-Agents.md
+++ b/docs/Training-ML-Agents.md
 using TensorBoard during or after training by running the following command:

 ```sh
-tensorboard --logdir=summaries
+tensorboard --logdir=summaries --port 6006
+
+**Note:** The default port TensorBoard uses is 6006. If there is an existing session
+running on port 6006 a new session can be launched on an open port using the --port 
+option.

 When training is finished, you can find the saved model in the `models` folder
 under the assigned run-id — in the cats example, the path to the model would be
  the oldest checkpoint is deleted when saving a new checkpoint. Defaults to 5.
 * `--lesson=<n>`: Specify which lesson to start with when performing curriculum
  training. Defaults to 0.
-* `--load`: If set, the training code loads an already trained model to
-  initialize the neural network before training. The learning code looks for the
-  model in `models/<run-id>/` (which is also where it saves models at the end of
-  training). When not set (the default), the neural network weights are randomly
-  initialized and an existing model is not loaded.
+* `--num-envs=<n>`: Specifies the number of concurrent Unity environment instances to 
+  collect experiences from when training. Defaults to 1.
 * `--run-id=<path>`: Specifies an identifier for each training run. This
  identifier is used to name the subdirectories in which the trained model and
  summary statistics are saved as well as the saved model itself. The default id
  training. Defaults to 50000.
 * `--seed=<n>`: Specifies a number to use as a seed for the random number
  generator used by the training code.
+* `--env-args=<string>`: Specify arguments for the executable environment. Be aware that
+  the standalone build will also process these as
+  [Unity Command Line Arguments](https://docs.unity3d.com/Manual/CommandLineArguments.html).
+  You should choose different argument names if you want to create environment-specific arguments.
+  All arguments after this flag will be passed to the executable. For example, setting
+  `mlagents-learn config/trainer_config.yaml --env-args --num-orcs 42` would result in
+   ` --num-orcs 42` passed to the executable.
+* `--base-port`: Specifies the starting port. Each concurrent Unity environment instance 
+  will get assigned a port sequentially, starting from the `base-port`. Each instance 
+  will use the port `(base_port + worker_id)`, where the `worker_id` is sequential IDs 
+  given to each instance from 0 to `num_envs - 1`. Default is 5005.
 * `--slow`: Specify this option to run the Unity environment at normal, game
  speed. The `--slow` mode uses the **Time Scale** and **Target Frame Rate**
  specified in the Academy's **Inference Configuration**. By default, training
 * `--train`: Specifies whether to train model or only run in inference mode.
  When training, **always** use the `--train` option.
-* `--num-envs=<n>`: Specifies the number of concurrent Unity environment instances to collect
-  experiences from when training. Defaults to 1.
-* `--base-port`: Specifies the starting port. Each concurrent Unity environment instance will
-  get assigned a port sequentially, starting from the `base-port`.  Each instance will use the
-  port `(base_port + worker_id)`, where the `worker_id` is sequential IDs given to each instance
-  from 0 to `num_envs - 1`. Default is 5005.
-* `--docker-target-name=<dt>`: The Docker Volume on which to store curriculum,
-  executable and model files. See [Using Docker](Using-Docker.md).
+* `--load`: If set, the training code loads an already trained model to
+  initialize the neural network before training. The learning code looks for the
+  model in `models/<run-id>/` (which is also where it saves models at the end of
+  training). When not set (the default), the neural network weights are randomly
+  initialized and an existing model is not loaded.
 * `--no-graphics`: Specify this option to run the Unity executable in
  `-batchmode` and doesn't initialize the graphics driver. Use this only if your
  training doesn't involve visual observations (reading from Pixels). See
 * `--multi-gpu`: Setting this flag enables the use of multiple GPU's (if available) during training.
-* `--env-args=<string>`: Specify arguments for the executable environment. Be aware that
-  the standalone build will also process these as
-  [Unity Command Line Arguments](https://docs.unity3d.com/Manual/CommandLineArguments.html).
-  You should choose different argument names if you want to create environment-specific arguments.
-  All arguments after this flag will be passed to the executable. For example, setting
-  `mlagents-learn config/trainer_config.yaml --env-args --num-orcs 42` would result in
-   ` --num-orcs 42` passed to the executable.
+
-`config/gail_config.yaml` and `config/offline_bc_config.yaml`
-specifies the training method, the hyperparameters, and a few additional values to use when
+`config/gail_config.yaml` and `config/offline_bc_config.yaml` specifies the training method,
+the hyperparameters, and a few additional values to use when training with Proximal Policy 
+Optimization(PPO), Soft Actor-Critic(SAC), GAIL (Generative Adversarial Imitation Learning) 
+with PPO, and online and offline Behavioral Cloning(BC)/Imitation. These files are divided 
+into sections. The **default** section defines the default values for all the available
-The **default** section defines the default values for all the available
-settings. You can also add new sections to override these defaults to train
-specific Brains. Name each of these override sections after the GameObject
-containing the Brain component that should use these settings. (This GameObject
-will be a child of the Academy in your scene.) Sections for the example
-environments are included in the provided config file.
+The **default** section defines the default values for all the available settings. You can 
+also add new sections to override these defaults to train specific Brains. Name each of these
+override sections after the GameObject containing the Brain component that should use these 
+settings. (This GameObject will be a child of the Academy in your scene.) Sections for the 
+example environments are included in the provided config file.
-| batch_size           | The number of experiences in each iteration of gradient descent.                                                                                                                        | PPO, SAC, BC                  |
+| batch_size           | The number of experiences in each iteration of gradient descent.                                                                                                                        | PPO, SAC, BC             |
-| buffer_size          | The number of experiences to collect before updating the policy model. In SAC, the max size of the experience buffer.                                                                   | PPO, SAC                      |
-| buffer_init_steps          | The number of experiences to collect into the buffer before updating the policy model.                                                                                            | SAC                      |
+| buffer_size          | The number of experiences to collect before updating the policy model. In SAC, the max size of the experience buffer.                                                                   | PPO, SAC                 |
+| buffer_init_steps    | The number of experiences to collect into the buffer before updating the policy model.                                                                                                  | SAC                      |
-| hidden_units         | The number of units in the hidden layers of the neural network.                                                                                                                         | PPO, SAC, BC                  |
-| init_entcoef         | How much the agent should explore in the beginning of training.                                                                                                                         | SAC                  |
+| hidden_units         | The number of units in the hidden layers of the neural network.                                                                                                                         | PPO, SAC, BC             |
+| init_entcoef         | How much the agent should explore in the beginning of training.                                                                                                                         | SAC                      |
-| learning_rate        | The initial learning rate for gradient descent.                                                                                                                                         | PPO, SAC, BC                  |
-| max_steps            | The maximum number of simulation steps to run during a training session.                                                                                                                | PPO, SAC, BC                  |
-| memory_size          | The size of the memory an agent must keep. Used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md).                                 | PPO, SAC, BC                  |
-| normalize            | Whether to automatically normalize observations.                                                                                                                                        | PPO, SAC                      |
+| learning_rate        | The initial learning rate for gradient descent.                                                                                                                                         | PPO, SAC, BC             |
+| max_steps            | The maximum number of simulation steps to run during a training session.                                                                                                                | PPO, SAC, BC             |
+| memory_size          | The size of the memory an agent must keep. Used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md).                                 | PPO, SAC, BC             |
+| normalize            | Whether to automatically normalize observations.                                                                                                                                        | PPO, SAC                 |
-| num_layers           | The number of hidden layers in the neural network.                                                                                                                                      | PPO, SAC, BC                  |
-| pretraining          | Use demonstrations to bootstrap the policy neural network. See [Pretraining Using Demonstrations](Training-PPO.md#optional-pretraining-using-demonstrations).                                                                                            | PPO, SAC                      |
-| reward_signals       | The reward signals used to train the policy. Enable Curiosity and GAIL here. See [Reward Signals](Reward-Signals.md) for configuration options.                           | PPO, SAC, BC                  |
-| save_replay_buffer      | Saves the replay buffer when exiting training, and loads it on resume.                                                                                                         | SAC                |
-| sequence_length      | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC, BC                  |
-| summary_freq         | How often, in steps, to save training statistics. This determines the number of data points shown by TensorBoard.                                                                       | PPO, SAC, BC                  |
-| tau                  | How aggressively to update the target network used for bootstrapping value estimation in SAC.                                                                                            | SAC                          |
-| time_horizon         | How many steps of experience to collect per-agent before adding it to the experience buffer.                                                                                            | PPO, SAC        |
-| trainer              | The type of training to perform: "ppo", "sac" or "offline_bc".                                                                                                                                  | PPO, SAC, BC                  |
-| train_interval              | How often to update the agent.                                                                                                                                                    | SAC                  |
-| num_update           | Number of mini-batches to update the agent with during each update.                                                                                       | SAC                  |
-| use_recurrent        | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md).                                                                                       | PPO, SAC, BC                  |
-
+<<<<<<< HEAD
+| num_layers           | The number of hidden layers in the neural network.                                                                                                                                      | PPO, SAC, BC             |
+| pretraining          | Use demonstrations to bootstrap the policy neural network. See [Pretraining Using Demonstrations](Training-PPO.md#optional-pretraining-using-demonstrations).                           | PPO, SAC                 |
+| reward_signals       | The reward signals used to train the policy. Enable Curiosity and GAIL here. See [Reward Signals](Reward-Signals.md) for configuration options.                                         | PPO, SAC, BC             |
+| save_replay_buffer   | Saves the replay buffer when exiting training, and loads it on resume.                                                                                                                  | SAC                      |
+| sequence_length      | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC, BC             |
+| summary_freq         | How often, in steps, to save training statistics. This determines the number of data points shown by TensorBoard.                                                                       | PPO, SAC, BC             |
+| tau                  | How aggressively to update the target network used for bootstrapping value estimation in SAC.                                                                                           | SAC                      |
+| time_horizon         | How many steps of experience to collect per-agent before adding it to the experience buffer.                                                                                            | PPO, SAC, (online)BC     |
+| trainer              | The type of training to perform: "ppo", "sac", "offline_bc" or "online_bc".                                                                                                             | PPO, SAC, BC             |
+| train_interval       | How often to update the agent.                                                                                                                                                          | SAC                      |
+| num_update           | Number of mini-batches to update the agent with during each update.                                                                                                                     | SAC                      |
+| use_recurrent        | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md).                                                                                       | PPO, SAC, BC             |

 \*PPO = Proximal Policy Optimization, SAC = Soft Actor-Critic, BC = Behavioral Cloning (Imitation)

--- a/docs/Training-on-Amazon-Web-Service.md
+++ b/docs/Training-on-Amazon-Web-Service.md
 # Training on Amazon Web Service

+Note: We no longer use this guide ourselves and so it may not work correctly. We've 
+decided to keep it up just in case it is helpful to you.
+
 This page contains instructions for setting up an EC2 instance on Amazon Web
 Service for training ML-Agents environments.

--- a/docs/Training-on-Microsoft-Azure.md
+++ b/docs/Training-on-Microsoft-Azure.md
 # Training on Microsoft Azure (works with ML-Agents toolkit v0.3)

+Note: We no longer use this guide ourselves and so it may not work correctly. We've 
+decided to keep it up just in case it is helpful to you.
+
 This page contains instructions for setting up training on Microsoft Azure
 through either
 [Azure Container Instances](https://azure.microsoft.com/services/container-instances/)
 [Azure Container Instances](https://azure.microsoft.com/services/container-instances/)
 allow you to spin up a container, on demand, that will run your training and
 then be shut down.  This ensures you aren't leaving a billable VM running when
-it isn't needed.  You can read more about
-[The ML-Agents toolkit support for Docker containers here](Using-Docker.md).
-Using ACI enables you to offload training of your models without needing to
-install Python and TensorFlow on your own computer.  You can find instructions,
-including a pre-deployed image in DockerHub for you to use, available
-[here](https://github.com/druttka/unity-ml-on-azure).
+it isn't needed. Using ACI enables you to offload training of your models without needing to
+install Python and TensorFlow on your own computer.
--- a/docs/Using-Tensorboard.md
+++ b/docs/Using-Tensorboard.md
 3. From the command line run :

      ```sh
-      tensorboard --logdir=summaries
+      tensorboard --logdir=summaries --port=6006
+
+**Note:** The default port TensorBoard uses is 6006. If there is an existing session
+running on port 6006 a new session can be launched on an open port using the --port 
+option.

 **Note:** If you don't assign a `run-id` identifier, `mlagents-learn` uses the
 default string, "ppo". All the statistics will be saved to the same sub-folder
--- a/docs/Using-Virtual-Environment.md
+++ b/docs/Using-Virtual-Environment.md
+# Using Virtual Environment
+
+## What is a Virtual Environment?
+A Virtual Environment is a self contained directory tree that contains a Python installation 
+for a particular version of Python, plus a number of additional packages. To learn more about 
+Virtual Environments see [here](https://docs.python.org/3/library/venv.html)
+
+## Why should I use a Virtual Environment?
+A Virtual Environment keeps all dependencies for the Python project separate from dependencies 
+of other projects. This has a few advantages:
+1. It makes dependency management for the project easy.
+1. It enables using and testing of different library versions by quickly 
+spinning up a new environment and verifying the compatibility of the code with the
+different version. 
+
+Requirement - Python 3.6 must be installed on the machine you would like 
+to run ML-Agents on (either local laptop/desktop or remote server). Python 3.6 can be 
+installed from [here](https://www.python.org/downloads/). 
+
+
+## Installing Pip (Required)
+
+1. Download the `get-pip.py` file using the command `curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py`
+1. Run the following `python3 get-pip.py`
+1. Check pip version using `pip3 -V`
+
+Note (for Ubuntu users): If the `ModuleNotFoundError: No module named 'distutils.util'` error is encountered, then
+python3-distutils needs to be installed. Install python3-distutils using `sudo apt-get install python3-distutils` 
+
+## Mac OS X Setup
+
+1. Create a folder where the virtual environments will reside `$ mkdir ~/python-envs`
+1. To create a new environment named `sample-env` execute `$ python3 -m venv ~/python-envs/sample-env`  
+1. To activate the environment execute `$ source ~/python-envs/sample-env/bin/activate`
+1. Verify pip version is the same as in the __Installing Pip__ section. In case it is not the latest, upgrade to 
+the latest pip version using `pip3 install --upgrade pip` 
+1. Install ML-Agents package using `$ pip3 install mlagents`
+1. To deactivate the environment execute `$ deactivate`
+
+## Ubuntu Setup 
+
+1. Install the python3-venv package using `$ sudo apt-get install python3-venv`
+1. Follow the steps in the Mac OS X installation.
+
+## Windows Setup
+
+1. Create a folder where the virtual environments will reside `$ md python-envs`
+1. To create a new environment named `sample-env` execute `$ python3 -m venv python-envs\sample-env`  
+1. To activate the environment execute `$ python-envs\sample-env\Scripts\activate`
+1. Verify pip version is the same as in the __Installing Pip__ section. In case it is not the latest, upgrade to 
+the latest pip version using `pip3 install --upgrade pip`
+1. Install ML-Agents package using `$ pip3 install mlagents`
+1. To deactivate the environment execute `$ deactivate`
--- a/docs/Using-Docker.md
+++ b/docs/Using-Docker.md
-# Using Docker For ML-Agents
-
-We currently offer a solution for Windows and Mac users who would like to do
-training or inference using Docker. This option may be appealing to those who
-would like to avoid installing Python and TensorFlow themselves. The current
-setup forces both TensorFlow and Unity to _only_ rely on the CPU for
-computations. Consequently, our Docker simulation does not use a GPU and uses
-[`Xvfb`](https://en.wikipedia.org/wiki/Xvfb) to do visual rendering. `Xvfb` is a
-utility that enables `ML-Agents` (or any other application) to do rendering
-virtually i.e. it does not assume that the machine running `ML-Agents` has a GPU
-or a display attached to it. This means that rich environments which involve
-agents using camera-based visual observations might be slower.
-
-## Requirements
-
- Unity _Linux Build Support_ Component
- [Docker](https://www.docker.com)
-
-## Setup
-
- [Download](https://unity3d.com/get-unity/download) the Unity Installer and add
-  the _Linux Build Support_ Component
-
- [Download](https://www.docker.com/community-edition#/download) and install
-  Docker if you don't have it setup on your machine.
-
- Since Docker runs a container in an environment that is isolated from the host
-  machine, a mounted directory in your host machine is used to share data, e.g.
-  the trainer configuration file, Unity executable, curriculum files and
-  TensorFlow graph. For convenience, we created an empty `unity-volume`
-  directory at the root of the repository for this purpose, but feel free to use
-  any other directory. The remainder of this guide assumes that the
-  `unity-volume` directory is the one used.
-
-## Usage
-
-Using Docker for ML-Agents involves three steps: building the Unity environment
-with specific flags, building a Docker container and, finally, running the
-container. If you are not familiar with building a Unity environment for
-ML-Agents, please read through our [Getting Started with the 3D Balance Ball
-Example](Getting-Started-with-Balance-Ball.md) guide first.
-
-### Build the Environment (Optional)
-
-_If you want to used the Editor to perform training, you can skip this step._
-
-Since Docker typically runs a container sharing a (linux) kernel with the host
-machine, the Unity environment **has** to be built for the **linux platform**.
-When building a Unity environment, please select the following options from the
-the Build Settings window:
-
- Set the _Target Platform_ to `Linux`
- Set the _Architecture_ to `x86_64`
- If the environment does not contain visual observations, you can select the
-  `headless` option here.
-
-Then click `Build`, pick an environment name (e.g. `3DBall`) and set the output
-directory to `unity-volume`. After building, ensure that the file
-`<environment-name>.x86_64` and subdirectory `<environment-name>_Data/` are
-created under `unity-volume`.
-
-![Build Settings For Docker](images/docker_build_settings.png)
-
-### Build the Docker Container
-
-First, make sure the Docker engine is running on your machine. Then build the
-Docker container by calling the following command at the top-level of the
-repository:
-
-```sh
-docker build -t <image-name> .
-```
-
-Replace `<image-name>` with a name for the Docker image, e.g.
-`balance.ball.v0.1`.
-
-### Run the Docker Container
-
-Run the Docker container by calling the following command at the top-level of
-the repository:
-
-```sh
-docker run -it --name <container-name> \
-           --mount type=bind,source="$(pwd)"/unity-volume,target=/unity-volume \
-           -p 5005:5005 \
-           -p 6006:6006 \
-           <image-name>:latest \
-           --docker-target-name=unity-volume \
-           <trainer-config-file> \
-           --env=<environment-name> \
-           --train \
-           --run-id=<run-id>
-```
-
-Notes on argument values:
-
- `<container-name>` is used to identify the container (in case you want to
-  interrupt and terminate it). This is optional and Docker will generate a
-  random name if this is not set. _Note that this must be unique for every run
-  of a Docker image._
- `<image-name>` references the image name used when building the container.
- `<environment-name>` __(Optional)__: If you are training with a linux
-  executable, this is the name of the executable. If you are training in the
-  Editor, do not pass a `<environment-name>` argument and press the
-  :arrow_forward: button in Unity when the message _"Start training by pressing
-  the Play button in the Unity Editor"_ is displayed on the screen.
- `source`: Reference to the path in your host OS where you will store the Unity
-  executable.
- `target`: Tells Docker to mount the `source` path as a disk with this name.
- `docker-target-name`: Tells the ML-Agents Python package what the name of the
-  disk where it can read the Unity executable and store the graph. **This should
-  therefore be identical to `target`.**
- `trainer-config-file`, `train`, `run-id`: ML-Agents arguments passed to
-  `mlagents-learn`. `trainer-config-file` is the filename of the trainer config
-  file, `train` trains the algorithm, and `run-id` is used to tag each
-  experiment with a unique identifier. We recommend placing the trainer-config
-  file inside `unity-volume` so that the container has access to the file.
-
-To train with a `3DBall` environment executable, the command would be:
-
-```sh
-docker run -it --name 3DBallContainer.first.trial \
-           --mount type=bind,source="$(pwd)"/unity-volume,target=/unity-volume \
-           -p 5005:5005 \
-           -p 6006:6006 \
-           balance.ball.v0.1:latest 3DBall \
-           --docker-target-name=unity-volume \
-           trainer_config.yaml \
-           --env=3DBall \
-           --train \
-           --run-id=3dball_first_trial
-```
-
-For more detail on Docker mounts, check out
-[these](https://docs.docker.com/storage/bind-mounts/) docs from Docker.
-
-**NOTE** If you are training using docker for environments that use visual observations, you may need to increase the default memory that Docker allocates for the container. For example, see [here](https://docs.docker.com/docker-for-mac/#advanced) for instructions for Docker for Mac.
-
-### Running Tensorboard
-
-You can run Tensorboard to monitor your training instance on http://localhost:6006:
-
-```sh
-docker exec -it <container-name> tensorboard --logdir=/unity-volume/summaries --host=0.0.0.0
-```
-
-With our previous 3DBall example, this command would look like this:
-```sh
-docker exec -it 3DBallContainer.first.trial tensorboard --logdir=/unity-volume/summaries --host=0.0.0.0
-```
-
-For more details on Tensorboard, check out the documentation about [Using Tensorboard](Using-Tensorboard.md).
-
-### Stopping Container and Saving State
-
-If you are satisfied with the training progress, you can stop the Docker
-container while saving state by either using `Ctrl+C` or `⌘+C` (Mac) or by using
-the following command:
-
-```sh
-docker kill --signal=SIGINT <container-name>
-```
-
-`<container-name>` is the name of the container specified in the earlier `docker
-run` command. If you didn't specify one, you can find the randomly generated
-identifier by running `docker container ls`.
--- a/docs/Installation-Windows.md
+++ b/docs/Installation-Windows.md
-# Installing ML-Agents Toolkit for Windows
-
-The ML-Agents toolkit supports Windows 10. While it might be possible to run the
-ML-Agents toolkit using other versions of Windows, it has not been tested on
-other versions. Furthermore, the ML-Agents toolkit has not been tested on a
-Windows VM such as Bootcamp or Parallels.
-
-To use the ML-Agents toolkit, you install Python and the required Python
-packages as outlined below. This guide also covers how set up GPU-based training
-(for advanced users). GPU-based training is not currently required for the 
-ML-Agents toolkit. However, training on a GPU might be required by future
-versions and features.
-
-## Step 1: Install Python via Anaconda
-
-[Download](https://www.anaconda.com/download/#windows) and install Anaconda for
-Windows. By using Anaconda, you can manage separate environments for different
-distributions of Python. Python 3.6.1 or higher is required as we no longer support
-Python 2. In this guide, we are using Python version 3.6 and Anaconda version
-5.1
-([64-bit](https://repo.continuum.io/archive/Anaconda3-5.1.0-Windows-x86_64.exe)
-or [32-bit](https://repo.continuum.io/archive/Anaconda3-5.1.0-Windows-x86.exe)
-direct links).
-
-<p align="center">
-  <img src="images/anaconda_install.PNG"
-       alt="Anaconda Install"
-       width="500" border="10" />
-</p>
-
-We recommend the default _advanced installation options_. However, select the
-options appropriate for your specific situation.
-
-<p align="center">
-  <img src="images/anaconda_default.PNG" alt="Anaconda Install" width="500" border="10" />
-</p>
-
-After installation, you must open __Anaconda Navigator__ to finish the setup.
-From the Windows search bar, type _anaconda navigator_. You can close Anaconda
-Navigator after it opens.
-
-If environment variables were not created, you will see error "conda is not
-recognized as internal or external command" when you type `conda` into the
-command line. To solve this you will need to set the environment variable
-correctly.
-
-Type `environment variables` in the search bar (this can be reached by hitting
-the Windows key or the bottom left Windows button). You should see an option
-called __Edit the system environment variables__.
-
-<p align="center">
-  <img src="images/edit_env_var.png"
-       alt="edit env variables"
-       width="250" border="10" />
-</p>
-
-From here, click the __Environment Variables__ button. Double click "Path" under
-__System variable__ to edit the "Path" variable, click __New__ to add the
-following new paths.
-
-```console
-%UserProfile%\Anaconda3\Scripts
-%UserProfile%\Anaconda3\Scripts\conda.exe
-%UserProfile%\Anaconda3
-%UserProfile%\Anaconda3\python.exe
-```
-
-## Step 2: Setup and Activate a New Conda Environment
-
-You will create a new [Conda environment](https://conda.io/docs/) to be used
-with the ML-Agents toolkit. This means that all the packages that you install
-are localized to just this environment. It will not affect any other
-installation of Python or other environments. Whenever you want to run
-ML-Agents, you will need activate this Conda environment.
-
-To create a new Conda environment, open a new Anaconda Prompt (_Anaconda Prompt_
-in the search bar) and type in the following command:
-
-```sh
-conda create -n ml-agents python=3.6
-```
-
-You may be asked to install new packages. Type `y` and press enter _(make sure
-you are connected to the Internet)_. You must install these required packages.
-The new Conda environment is called ml-agents and uses Python version 3.6.
-
-<p align="center">
-  <img src="images/conda_new.PNG" alt="Anaconda Install" width="500" border="10" />
-</p>
-
-To use this environment, you must activate it. _(To use this environment In the
-future, you can run the same command)_. In the same Anaconda Prompt, type in the
-following command:
-
-```sh
-activate ml-agents
-```
-
-You should see `(ml-agents)` prepended on the last line.
-
-Next, install `tensorflow`. Install this package using `pip` - which is a
-package management system used to install Python packages. Latest versions of
-TensorFlow won't work, so you will need to make sure that you install version
-1.7.1. In the same Anaconda Prompt, type in the following command _(make sure
-you are connected to the Internet)_:
-
-```sh
-pip install tensorflow==1.7.1
-```
-
-## Step 3: Install Required Python Packages
-
-The ML-Agents toolkit depends on a number of Python packages. Use `pip` to
-install these Python dependencies.
-
-If you haven't already, clone the ML-Agents Toolkit Github repository to your
-local computer. You can do this using Git ([download
-here](https://git-scm.com/download/win)) and running the following commands in
-an Anaconda Prompt _(if you open a new prompt, be sure to activate the ml-agents
-Conda environment by typing `activate ml-agents`)_:
-
-```sh
-git clone https://github.com/Unity-Technologies/ml-agents.git
-```
-
-If you don't want to use Git, you can always directly download all the files
-[here](https://github.com/Unity-Technologies/ml-agents/archive/master.zip).
-
-The `UnitySDK` subdirectory contains the Unity Assets to add to your projects.
-It also contains many [example environments](Learning-Environment-Examples.md)
-to help you get started.
-
-The `ml-agents` subdirectory contains a Python package which provides deep reinforcement 
-learning trainers to use with Unity environments.
-
-The `ml-agents-envs` subdirectory contains a Python API to interface with Unity, which
-the `ml-agents` package depends on. 
-
-The `gym-unity` subdirectory contains a package to interface with OpenAI Gym.
-
-Keep in mind where the files were downloaded, as you will need the 
-trainer config files in this directory when running `mlagents-learn`.
-Make sure you are connected to the Internet and then type in the Anaconda
-Prompt:
-
-```console
-pip install mlagents
-```
-
-This will complete the installation of all the required Python packages to run
-the ML-Agents toolkit.
-
-Sometimes on Windows, when you use pip to install certain Python packages, the pip will get stuck when trying to read the cache of the package. If you see this, you can try:
-
-```console
-pip install mlagents --no-cache-dir
-```
-
-This `--no-cache-dir` tells the pip to disable the cache.  
-
-### Installing for Development
-
-If you intend to make modifications to `ml-agents` or `ml-agents-envs`, you should install 
-the packages from the cloned repo rather than from PyPi. To do this, you will need to install
- `ml-agents` and `ml-agents-envs` separately. 
- 
-In our example, the files are located in `C:\Downloads`. After you have either
-cloned or downloaded the files, from the Anaconda Prompt, change to the ml-agents
-subdirectory inside the ml-agents directory:
-
-```console
-cd C:\Downloads\ml-agents
-```
- 
-From the repo's main directory, now run:
-
-```console
-cd ml-agents-envs
-pip install -e .
-cd ..
-cd ml-agents
-pip install -e .
-```
-
-Running pip with the `-e` flag will let you make changes to the Python files directly and have those
-reflected when you run `mlagents-learn`. It is important to install these packages in this order as the
-`mlagents` package depends on `mlagents_envs`, and installing it in the other 
-order will download `mlagents_envs` from PyPi. 
-
-## (Optional) Step 4: GPU Training using The ML-Agents Toolkit
-
-GPU is not required for the ML-Agents toolkit and won't speed up the PPO
-algorithm a lot during training(but something in the future will benefit from
-GPU). This is a guide for advanced users who want to train using GPUs.
-Additionally, you will need to check if your GPU is CUDA compatible. Please
-check Nvidia's page [here](https://developer.nvidia.com/cuda-gpus).
-
-Currently for the ML-Agents toolkit, only CUDA v9.0 and cuDNN v7.0.5 is supported.
-
-### Install Nvidia CUDA toolkit
-
-[Download](https://developer.nvidia.com/cuda-toolkit-archive) and install the
-CUDA toolkit 9.0 from Nvidia's archive. The toolkit includes GPU-accelerated
-libraries, debugging and optimization tools, a C/C++ (Step Visual Studio 2017)
-compiler and a runtime library and is needed to run the ML-Agents toolkit. In
-this guide, we are using version
-[9.0.176](https://developer.nvidia.com/compute/cuda/9.0/Prod/network_installers/cuda_9.0.176_win10_network-exe)).
-
-Before installing, please make sure you __close any running instances of Unity
-or Visual Studio__.
-
-Run the installer and select the Express option. Note the directory where you
-installed the CUDA toolkit. In this guide, we installed in the directory
-`C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0`
-
-### Install Nvidia cuDNN library
-
-[Download](https://developer.nvidia.com/cudnn) and install the cuDNN library
-from Nvidia. cuDNN is a GPU-accelerated library of primitives for deep neural
-networks. Before you can download, you will need to sign up for free to the
-Nvidia Developer Program.
-
-<p align="center">
-  <img src="images/cuDNN_membership_required.png"
-       alt="cuDNN membership required"
-       width="500" border="10" />
-</p>
-
-Once you've signed up, go back to the cuDNN
-[downloads page](https://developer.nvidia.com/cudnn).
-You may or may not be asked to fill out a short survey. When you get to the list
-cuDNN releases, __make sure you are downloading the right version for the CUDA
-toolkit you installed in Step 1.__ In this guide, we are using version 7.0.5 for
-CUDA toolkit version 9.0
-([direct link](https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.5/prod/9.0_20171129/cudnn-9.0-windows10-x64-v7)).
-
-After you have downloaded the cuDNN files, you will need to extract the files
-into the CUDA toolkit directory. In the cuDNN zip file, there are three folders
-called `bin`, `include`, and `lib`.
-
-<p align="center">
-  <img src="images/cudnn_zip_files.PNG"
-       alt="cuDNN zip files"
-       width="500" border="10" />
-</p>
-
-Copy these three folders into the CUDA toolkit directory. The CUDA toolkit
-directory is located at
-`C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0`
-
-<p align="center">
-  <img src="images/cuda_toolkit_directory.PNG"
-       alt="cuda toolkit directory"
-       width="500" border="10" />
-</p>
-
-### Set Environment Variables
-
-You will need to add one environment variable and two path variables.
-
-To set the environment variable, type `environment variables` in the search bar
-(this can be reached by hitting the Windows key or the bottom left Windows
-button). You should see an option called __Edit the system environment
-variables__.
-
-<p align="center">
-  <img src="images/edit_env_var.png"
-       alt="edit env variables"
-       width="250" border="10" />
-</p>
-
-From here, click the __Environment Variables__ button. Click __New__ to add a
-new system variable _(make sure you do this under __System variables__ and not
-User variables_.
-
-<p align="center">
-  <img src="images/new_system_variable.PNG"
-       alt="new system variable"
-       width="500" border="10" />
-</p>
-
-For __Variable Name__, enter `CUDA_HOME`. For the variable value, put the
-directory location for the CUDA toolkit. In this guide, the directory location
-is `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0`. Press __OK__ once.
-
-<p align="center">
-  <img src="images/system_variable_name_value.PNG"
-       alt="system variable names and values"
-       width="500" border="10" />
-</p>
-
-To set the two path variables, inside the same __Environment Variables__ window
-and under the second box called __System Variables__, find a variable called
-`Path` and click __Edit__. You will add two directories to the list. For this
-guide, the two entries would look like:
-
-```console
-C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64
-C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\extras\CUPTI\libx64
-```
-
-Make sure to replace the relevant directory location with the one you have
-installed. _Please note that case sensitivity matters_.
-
-<p align="center">
-    <img src="images/path_variables.PNG"
-        alt="Path variables"
-        width="500" border="10" />
-</p>
-
-### Install TensorFlow GPU
-
-Next, install `tensorflow-gpu` using `pip`. You'll need version 1.7.1. In an
-Anaconda Prompt with the Conda environment ml-agents activated, type in the
-following command to uninstall TensorFlow for cpu and install TensorFlow
-for gpu _(make sure you are connected to the Internet)_:
-
-```sh
-pip uninstall tensorflow
-pip install tensorflow-gpu==1.7.1
-```
-
-Lastly, you should test to see if everything installed properly and that
-TensorFlow can identify your GPU. In the same Anaconda Prompt, open Python 
-in the Prompt by calling:
-
-```sh
-python
-```
-
-And then type the following commands:
-
-```python
-import tensorflow as tf
-
-sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
-```
-
-You should see something similar to:
-
-```console
-Found device 0 with properties ...
-```
-
-## Acknowledgments
-
-We would like to thank
-[Jason Weimann](https://unity3d.college/2017/10/25/machine-learning-in-unity3d-setting-up-the-environment-tensorflow-for-agentml-on-windows-10/)
-and
-[Nitish S. Mutha](http://blog.nitishmutha.com/tensorflow/2017/01/22/TensorFlow-with-gpu-for-windows.html)
-for writing the original articles which were used to create this guide.