浏览代码

Develop mm docs various prettier (#3782)

* Running prettier formatting as a follow-up PR to #3775

Running the files (except Training-ML-Agents.md) through the `prettier` linter.

https://github.com/Unity-Technologies/ml-agents/pull/3775/files

* minor fixes

Changed a header from “Custom Metrics from C#” to “Custom Metrics from Unity”

Fixed formatting in FAQ

* Minor correction.
/develop/gym-wrapper
GitHub 4 年前
当前提交
b881c227
共有 11 个文件被更改,包括 929 次插入756 次删除
  1. 307
      com.unity.ml-agents/CHANGELOG.md
  2. 73
      docs/FAQ.md
  3. 120
      docs/Installation-Anaconda-Windows.md
  4. 642
      docs/Learning-Environment-Examples.md
  5. 54
      docs/Learning-Environment-Executable.md
  6. 99
      docs/Readme.md
  7. 147
      docs/Training-on-Amazon-Web-Service.md
  8. 73
      docs/Training-on-Microsoft-Azure.md
  9. 34
      docs/Using-Docker.md
  10. 73
      docs/Using-Tensorboard.md
  11. 63
      docs/Using-Virtual-Environment.md

307
com.unity.ml-agents/CHANGELOG.md


# Changelog
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
and this project adheres to
[Semantic Versioning](http://semver.org/spec/v2.0.0.html).
- The `--load` and `--train` command-line flags have been deprecated. Training now happens by default, and
use `--resume` to resume training instead. (#3705)
- The Jupyter notebooks have been removed from the repository.
- Introduced the `SideChannelUtils` to register, unregister and access side channels.
- `Academy.FloatProperties` was removed, please use `SideChannelUtils.GetSideChannel<FloatPropertiesChannel>()` instead.
- Removed the multi-agent gym option from the gym wrapper. For multi-agent scenarios, use the [Low Level Python API](../docs/Python-API.md).
- The low level Python API has changed. You can look at the document [Low Level Python API documentation](../docs/Python-API.md) for more information. If you use `mlagents-learn` for training, this should be a transparent change.
- Added ability to start training (initialize model weights) from a previous run ID. (#3710)
- The internal event `Academy.AgentSetStatus` was renamed to `Academy.AgentPreStep` and made public.
- The offset logic was removed from DecisionRequester.
- The signature of `Agent.Heuristic()` was changed to take a `float[]` as a parameter, instead of returning the array. This was done to prevent a common source of error where users would return arrays of the wrong size.
- The communication API version has been bumped up to 1.0.0 and will use [Semantic Versioning](https://semver.org/) to do compatibility checks for communication between Unity and the Python process.
- The obsolete `Agent` methods `GiveModel`, `Done`, `InitializeAgent`, `AgentAction` and `AgentReset` have been removed.
- The `--load` and `--train` command-line flags have been deprecated. Training
now happens by default, and use `--resume` to resume training instead. (#3705)
- The Jupyter notebooks have been removed from the repository.
- Introduced the `SideChannelUtils` to register, unregister and access side
channels.
- `Academy.FloatProperties` was removed, please use
`SideChannelUtils.GetSideChannel<FloatPropertiesChannel>()` instead.
- Removed the multi-agent gym option from the gym wrapper. For multi-agent
scenarios, use the [Low Level Python API](../docs/Python-API.md).
- The low level Python API has changed. You can look at the document
[Low Level Python API documentation](../docs/Python-API.md) for more
information. If you use `mlagents-learn` for training, this should be a
transparent change.
- Added ability to start training (initialize model weights) from a previous run
ID. (#3710)
- The internal event `Academy.AgentSetStatus` was renamed to
`Academy.AgentPreStep` and made public.
- The offset logic was removed from DecisionRequester.
- The signature of `Agent.Heuristic()` was changed to take a `float[]` as a
parameter, instead of returning the array. This was done to prevent a common
source of error where users would return arrays of the wrong size.
- The communication API version has been bumped up to 1.0.0 and will use
[Semantic Versioning](https://semver.org/) to do compatibility checks for
communication between Unity and the Python process.
- The obsolete `Agent` methods `GiveModel`, `Done`, `InitializeAgent`,
`AgentAction` and `AgentReset` have been removed.
- Format of console output has changed slightly and now matches the name of the model/summary directory. (#3630, #3616)
- Added a feature to allow sending stats from C# environments to TensorBoard (and other python StatsWriters). To do this from your code, use `SideChannelUtils.GetSideChannel<StatsSideChannel>().AddStat(key, value)` (#3660)
- Renamed 'Generalization' feature to 'Environment Parameter Randomization'.
- Timer files now contain a dictionary of metadata, including things like the package version numbers.
- SideChannel IncomingMessages methods now take an optional default argument, which is used when trying to read more data than the message contains.
- The way that UnityEnvironment decides the port was changed. If no port is specified, the behavior will depend on the `file_name` parameter. If it is `None`, 5004 (the editor port) will be used; otherwise 5005 (the base environment port) will be used.
- Fixed an issue where exceptions from environments provided a returncode of 0. (#3680)
- Running `mlagents-learn` with the same `--run-id` twice will no longer overwrite the existing files. (#3705)
- `StackingSensor` was changed from `internal` visibility to `public`
- Updated Barracuda to 0.6.3-preview.
### Bug Fixes
- Fixed a display bug when viewing Demonstration files in the inspector. The shapes of the observations in the file now display correctly. (#3771)
- Format of console output has changed slightly and now matches the name of the
model/summary directory. (#3630, #3616)
- Added a feature to allow sending stats from C# environments to TensorBoard
(and other python StatsWriters). To do this from your code, use
`SideChannelUtils.GetSideChannel<StatsSideChannel>().AddStat(key, value)`
(#3660)
- Renamed 'Generalization' feature to 'Environment Parameter Randomization'.
- Timer files now contain a dictionary of metadata, including things like the
package version numbers.
- SideChannel IncomingMessages methods now take an optional default argument,
which is used when trying to read more data than the message contains.
- The way that UnityEnvironment decides the port was changed. If no port is
specified, the behavior will depend on the `file_name` parameter. If it is
`None`, 5004 (the editor port) will be used; otherwise 5005 (the base
environment port) will be used.
- Fixed an issue where exceptions from environments provided a returncode of 0.
(#3680)
- Running `mlagents-learn` with the same `--run-id` twice will no longer
overwrite the existing files. (#3705)
- `StackingSensor` was changed from `internal` visibility to `public`
- Updated Barracuda to 0.6.3-preview.
### Bug Fixes
- Fixed a display bug when viewing Demonstration files in the inspector. The
shapes of the observations in the file now display correctly. (#3771)
- Raise the wall in CrawlerStatic scene to prevent Agent from falling off. (#3650)
- Fixed an issue where specifying `vis_encode_type` was required only for SAC. (#3677)
- Fixed the reported entropy values for continuous actions (#3684)
- Fixed an issue where switching models using `SetModel()` during training would use an excessive amount of memory. (#3664)
- Environment subprocesses now close immediately on timeout or wrong API version. (#3679)
- Fixed an issue in the gym wrapper that would raise an exception if an Agent called EndEpisode multiple times in the same step. (#3700)
- Fixed an issue where logging output was not visible; logging levels are now set consistently. (#3703)
- Raise the wall in CrawlerStatic scene to prevent Agent from falling off.
(#3650)
- Fixed an issue where specifying `vis_encode_type` was required only for SAC.
(#3677)
- Fixed the reported entropy values for continuous actions (#3684)
- Fixed an issue where switching models using `SetModel()` during training would
use an excessive amount of memory. (#3664)
- Environment subprocesses now close immediately on timeout or wrong API
version. (#3679)
- Fixed an issue in the gym wrapper that would raise an exception if an Agent
called EndEpisode multiple times in the same step. (#3700)
- Fixed an issue where logging output was not visible; logging levels are now
set consistently. (#3703)
- `Agent.CollectObservations` now takes a VectorSensor argument. (#3352, #3389)
- Added `Agent.CollectDiscreteActionMasks` virtual method with a `DiscreteActionMasker` argument to specify which discrete actions are unavailable to the Agent. (#3525)
- Beta support for ONNX export was added. If the `tf2onnx` python package is installed, models will be saved to `.onnx` as well as `.nn` format.
Note that Barracuda 0.6.0 or later is required to import the `.onnx` files properly
- Multi-GPU training and the `--multi-gpu` option has been removed temporarily. (#3345)
- All Sensor related code has been moved to the namespace `MLAgents.Sensors`.
- All SideChannel related code has been moved to the namespace `MLAgents.SideChannels`.
- `BrainParameters` and `SpaceType` have been removed from the public API
- `BehaviorParameters` have been removed from the public API.
- The following methods in the `Agent` class have been deprecated and will be removed in a later release:
- `InitializeAgent()` was renamed to `Initialize()`
- `AgentAction()` was renamed to `OnActionReceived()`
- `AgentReset()` was renamed to `OnEpisodeBegin()`
- `Done()` was renamed to `EndEpisode()`
- `GiveModel()` was renamed to `SetModel()`
- `Agent.CollectObservations` now takes a VectorSensor argument. (#3352, #3389)
- Added `Agent.CollectDiscreteActionMasks` virtual method with a
`DiscreteActionMasker` argument to specify which discrete actions are
unavailable to the Agent. (#3525)
- Beta support for ONNX export was added. If the `tf2onnx` python package is
installed, models will be saved to `.onnx` as well as `.nn` format. Note that
Barracuda 0.6.0 or later is required to import the `.onnx` files properly
- Multi-GPU training and the `--multi-gpu` option has been removed temporarily.
(#3345)
- All Sensor related code has been moved to the namespace `MLAgents.Sensors`.
- All SideChannel related code has been moved to the namespace
`MLAgents.SideChannels`.
- `BrainParameters` and `SpaceType` have been removed from the public API
- `BehaviorParameters` have been removed from the public API.
- The following methods in the `Agent` class have been deprecated and will be
removed in a later release:
- `InitializeAgent()` was renamed to `Initialize()`
- `AgentAction()` was renamed to `OnActionReceived()`
- `AgentReset()` was renamed to `OnEpisodeBegin()`
- `Done()` was renamed to `EndEpisode()`
- `GiveModel()` was renamed to `SetModel()`
- Monitor.cs was moved to Examples. (#3372)
- Automatic stepping for Academy is now controlled from the AutomaticSteppingEnabled property. (#3376)
- The GetEpisodeCount, GetStepCount, GetTotalStepCount and methods of Academy were changed to EpisodeCount, StepCount, TotalStepCount properties respectively. (#3376)
- Several classes were changed from public to internal visibility. (#3390)
- Academy.RegisterSideChannel and UnregisterSideChannel methods were added. (#3391)
- A tutorial on adding custom SideChannels was added (#3391)
- The stepping logic for the Agent and the Academy has been simplified (#3448)
- Update Barracuda to 0.6.1-preview
* The interface for `RayPerceptionSensor.PerceiveStatic()` was changed to take an input class and write to an output class, and the method was renamed to `Perceive()`.
- The checkpoint file suffix was changed from `.cptk` to `.ckpt` (#3470)
- The command-line argument used to determine the port that an environment will listen on was changed from `--port` to `--mlagents-port`.
- `DemonstrationRecorder` can now record observations outside of the editor.
- `DemonstrationRecorder` now has an optional path for the demonstrations. This will default to `Application.dataPath` if not set.
- `DemonstrationStore` was changed to accept a `Stream` for its constructor, and was renamed to `DemonstrationWriter`
- The method `GetStepCount()` on the Agent class has been replaced with the property getter `StepCount`
- `RayPerceptionSensorComponent` and related classes now display the debug gizmos whenever the Agent is selected (not just Play mode).
- Most fields on `RayPerceptionSensorComponent` can now be changed while the editor is in Play mode. The exceptions to this are fields that affect the number of observations.
- Most fields on `CameraSensorComponent` and `RenderTextureSensorComponent` were changed to private and replaced by properties with the same name.
- Unused static methods from the `Utilities` class (ShiftLeft, ReplaceRange, AddRangeNoAlloc, and GetSensorFloatObservationSize) were removed.
- The `Agent` class is no longer abstract.
- SensorBase was moved out of the package and into the Examples directory.
- `AgentInfo.actionMasks` has been renamed to `AgentInfo.discreteActionMasks`.
- `DecisionRequester` has been made internal (you can still use the DecisionRequesterComponent from the inspector). `RepeatAction` was renamed `TakeActionsBetweenDecisions` for clarity. (#3555)
- The `IFloatProperties` interface has been removed.
- Fix #3579.
- Improved inference performance for models with multiple action branches. (#3598)
- Fixed an issue when using GAIL with less than `batch_size` number of demonstrations. (#3591)
- The interfaces to the `SideChannel` classes (on C# and python) have changed to use new `IncomingMessage` and `OutgoingMessage` classes. These should make reading and writing data to the channel easier. (#3596)
- Updated the ExpertPyramid.demo example demonstration file (#3613)
- Updated project version for example environments to 2018.4.18f1. (#3618)
- Changed the Product Name in the example environments to remove spaces, so that the default build executable file doesn't contain spaces. (#3612)
- Monitor.cs was moved to Examples. (#3372)
- Automatic stepping for Academy is now controlled from the
AutomaticSteppingEnabled property. (#3376)
- The GetEpisodeCount, GetStepCount, GetTotalStepCount and methods of Academy
were changed to EpisodeCount, StepCount, TotalStepCount properties
respectively. (#3376)
- Several classes were changed from public to internal visibility. (#3390)
- Academy.RegisterSideChannel and UnregisterSideChannel methods were added.
(#3391)
- A tutorial on adding custom SideChannels was added (#3391)
- The stepping logic for the Agent and the Academy has been simplified (#3448)
- Update Barracuda to 0.6.1-preview
* The interface for `RayPerceptionSensor.PerceiveStatic()` was changed to take
an input class and write to an output class, and the method was renamed to
`Perceive()`.
- The checkpoint file suffix was changed from `.cptk` to `.ckpt` (#3470)
- The command-line argument used to determine the port that an environment will
listen on was changed from `--port` to `--mlagents-port`.
- `DemonstrationRecorder` can now record observations outside of the editor.
- `DemonstrationRecorder` now has an optional path for the demonstrations. This
will default to `Application.dataPath` if not set.
- `DemonstrationStore` was changed to accept a `Stream` for its constructor, and
was renamed to `DemonstrationWriter`
- The method `GetStepCount()` on the Agent class has been replaced with the
property getter `StepCount`
- `RayPerceptionSensorComponent` and related classes now display the debug
gizmos whenever the Agent is selected (not just Play mode).
- Most fields on `RayPerceptionSensorComponent` can now be changed while the
editor is in Play mode. The exceptions to this are fields that affect the
number of observations.
- Most fields on `CameraSensorComponent` and `RenderTextureSensorComponent` were
changed to private and replaced by properties with the same name.
- Unused static methods from the `Utilities` class (ShiftLeft, ReplaceRange,
AddRangeNoAlloc, and GetSensorFloatObservationSize) were removed.
- The `Agent` class is no longer abstract.
- SensorBase was moved out of the package and into the Examples directory.
- `AgentInfo.actionMasks` has been renamed to `AgentInfo.discreteActionMasks`.
- `DecisionRequester` has been made internal (you can still use the
DecisionRequesterComponent from the inspector). `RepeatAction` was renamed
`TakeActionsBetweenDecisions` for clarity. (#3555)
- The `IFloatProperties` interface has been removed.
- Fix #3579.
- Improved inference performance for models with multiple action branches.
(#3598)
- Fixed an issue when using GAIL with less than `batch_size` number of
demonstrations. (#3591)
- The interfaces to the `SideChannel` classes (on C# and python) have changed to
use new `IncomingMessage` and `OutgoingMessage` classes. These should make
reading and writing data to the channel easier. (#3596)
- Updated the ExpertPyramid.demo example demonstration file (#3613)
- Updated project version for example environments to 2018.4.18f1. (#3618)
- Changed the Product Name in the example environments to remove spaces, so that
the default build executable file doesn't contain spaces. (#3612)
- Fixed an issue which caused self-play training sessions to consume a lot of memory. (#3451)
- Fixed an IndexError when using GAIL or behavioral cloning with demonstrations recorded with 0.14.0 or later (#3464)
- Fixed an issue which caused self-play training sessions to consume a lot of
memory. (#3451)
- Fixed an IndexError when using GAIL or behavioral cloning with demonstrations
recorded with 0.14.0 or later (#3464)
- Fixed a bug with the rewards of multiple Agents in the gym interface (#3471, #3496)
- Fixed a bug with the rewards of multiple Agents in the gym interface (#3471,
#3496)
- A new self-play mechanism for training agents in adversarial scenarios was added (#3194)
- Tennis and Soccer environments were refactored to enable training with self-play (#3194, #3331)
- UnitySDK folder was split into a Unity Package (com.unity.ml-agents) and our examples were moved to the Project folder (#3267)
- A new self-play mechanism for training agents in adversarial scenarios was
added (#3194)
- Tennis and Soccer environments were refactored to enable training with
self-play (#3194, #3331)
- UnitySDK folder was split into a Unity Package (com.unity.ml-agents) and our
examples were moved to the Project folder (#3267)
- In order to reduce the size of the API, several classes and methods were marked as internal or private. Some public fields on the Agent were trimmed (#3342, #3353, #3269)
- Decision Period and on-demand decision checkboxes were removed from the Agent. on-demand decision is now the default (#3243)
- Calling Done() on the Agent will reset it immediately and call the AgentReset virtual method (#3291, #3242)
- The "Reset on Done" setting in AgentParameters was removed; this is now always true. AgentOnDone virtual method on the Agent was removed (#3311, #3222)
- Trainer steps are now counted per-Agent, not per-environment as in previous versions. For instance, if you have 10 Agents in the scene, 20 environment steps now correspond to 200 steps as printed in the terminal and in Tensorboard (#3113)
- In order to reduce the size of the API, several classes and methods were
marked as internal or private. Some public fields on the Agent were trimmed
(#3342, #3353, #3269)
- Decision Period and on-demand decision checkboxes were removed from the Agent.
on-demand decision is now the default (#3243)
- Calling Done() on the Agent will reset it immediately and call the AgentReset
virtual method (#3291, #3242)
- The "Reset on Done" setting in AgentParameters was removed; this is now always
true. AgentOnDone virtual method on the Agent was removed (#3311, #3222)
- Trainer steps are now counted per-Agent, not per-environment as in previous
versions. For instance, if you have 10 Agents in the scene, 20 environment
steps now correspond to 200 steps as printed in the terminal and in
Tensorboard (#3113)
- Curriculum config files are now YAML formatted and all curricula for a training run are combined into a single file (#3186)
- ML-Agents components, such as BehaviorParameters and various Sensor implementations, now appear in the Components menu (#3231)
- Exceptions are now raised in Unity (in debug mode only) if NaN observations or rewards are passed (#3221)
- RayPerception MonoBehavior, which was previously deprecated, was removed (#3304)
- Uncompressed visual (i.e. 3d float arrays) observations are now supported. CameraSensorComponent and RenderTextureSensor now have an option to write uncompressed observations (#3148)
- Agent’s handling of observations during training was improved so that an extra copy of the observations is no longer maintained (#3229)
- Error message for missing trainer config files was improved to include the absolute path (#3230)
- Curriculum config files are now YAML formatted and all curricula for a
training run are combined into a single file (#3186)
- ML-Agents components, such as BehaviorParameters and various Sensor
implementations, now appear in the Components menu (#3231)
- Exceptions are now raised in Unity (in debug mode only) if NaN observations or
rewards are passed (#3221)
- RayPerception MonoBehavior, which was previously deprecated, was removed
(#3304)
- Uncompressed visual (i.e. 3d float arrays) observations are now supported.
CameraSensorComponent and RenderTextureSensor now have an option to write
uncompressed observations (#3148)
- Agent’s handling of observations during training was improved so that an extra
copy of the observations is no longer maintained (#3229)
- Error message for missing trainer config files was improved to include the
absolute path (#3230)
- A bug that caused RayPerceptionSensor to behave inconsistently with transforms that have non-1 scale was fixed (#3321)
- Some small bugfixes to tensorflow_to_barracuda.py were backported from the barracuda release (#3341)
- Base port in the jupyter notebook example was updated to use the same port that the editor uses (#3283)
- A bug that caused RayPerceptionSensor to behave inconsistently with transforms
that have non-1 scale was fixed (#3321)
- Some small bugfixes to tensorflow_to_barracuda.py were backported from the
barracuda release (#3341)
- Base port in the jupyter notebook example was updated to use the same port
that the editor uses (#3283)
### This is the first release of *Unity Package ML-Agents*.
### This is the first release of _Unity Package ML-Agents_.
*Short description of this release*
_Short description of this release_

73
docs/FAQ.md


## Installation problems
### Tensorflow dependency
ML Agents requires TensorFlow; if you don't already have it installed, `pip` will try to install it when you install
the ml-agents package.
ML Agents requires TensorFlow; if you don't already have it installed, `pip`
will try to install it when you install the ml-agents package.
it means that there is no version of TensorFlow for your python environment. Some known potential causes are:
* You're using 32-bit python instead of 64-bit. See the answer [here](https://stackoverflow.com/a/1405971/224264)
for how to tell which you have installed.
* You're using python 3.8. Tensorflow plans to release packages for this as soon as possible; see
[this issue](https://github.com/tensorflow/tensorflow/issues/33374) for more details.
* You have the `tensorflow-gpu` package installed. This is equivalent to `tensorflow`, however `pip` doesn't recognize
this. The best way to resolve this is to update to `tensorflow==1.15.0` which provides GPU support in the same package
(see the [release notes](https://github.com/tensorflow/tensorflow/issues/33374) for more details.)
* You're on another architecture (e.g. ARM) which requires vendor provided packages.
it means that there is no version of TensorFlow for your python environment.
Some known potential causes are:
- You're using 32-bit python instead of 64-bit. See the answer
[here](https://stackoverflow.com/a/1405971/224264) for how to tell which you
have installed.
- You're using python 3.8. Tensorflow plans to release packages for this as soon
as possible; see
[this issue](https://github.com/tensorflow/tensorflow/issues/33374) for more
details.
- You have the `tensorflow-gpu` package installed. This is equivalent to
`tensorflow`, however `pip` doesn't recognize this. The best way to resolve
this is to update to `tensorflow==1.15.0` which provides GPU support in the
same package (see the
[release notes](https://github.com/tensorflow/tensorflow/issues/33374) for
more details.)
- You're on another architecture (e.g. ARM) which requires vendor provided
packages.
In all of these cases, the issue is a pip/python environment setup issue. Please search the tensorflow github issues
for similar problems and solutions before creating a new issue.
In all of these cases, the issue is a pip/python environment setup issue. Please
search the tensorflow github issues for similar problems and solutions before
creating a new issue.
If you directly import your Unity environment without building it in the
editor, you might need to give it additional permissions to execute it.
If you directly import your Unity environment without building it in the editor,
you might need to give it additional permissions to execute it.
If you receive such a permission error on macOS, run:

```
On Windows, you can find
[instructions](https://technet.microsoft.com/en-us/library/cc754344(v=ws.11).aspx).
[instructions](<https://technet.microsoft.com/en-us/library/cc754344(v=ws.11).aspx>).
## Environment Connection Timeout

There may be a number of possible causes:
* _Cause_: There may be no agent in the scene
* _Cause_: On OSX, the firewall may be preventing communication with the
- _Cause_: There may be no agent in the scene
- _Cause_: On OSX, the firewall may be preventing communication with the
* _Cause_: An error happened in the Unity Environment preventing communication.
_Solution_: Look into the [log
files](https://docs.unity3d.com/Manual/LogFiles.html) generated by the Unity
Environment to figure what error happened.
* _Cause_: You have assigned HTTP_PROXY and HTTPS_PROXY values in your
- _Cause_: An error happened in the Unity Environment preventing communication.
_Solution_: Look into the
[log files](https://docs.unity3d.com/Manual/LogFiles.html) generated by the
Unity Environment to figure what error happened.
- _Cause_: You have assigned `HTTP_PROXY` and `HTTPS_PROXY` values in your
If you receive an exception `"Couldn't launch new environment because
communication port {} is still in use. "`, you can change the worker number in
the Python script when calling
If you receive an exception
`"Couldn't launch new environment because communication port {} is still in use. "`,
you can change the worker number in the Python script when calling
```python
UnityEnvironment(file_name=filename, worker_id=X)

If you receive a message `Mean reward : nan` when attempting to train a model
using PPO, this is due to the episodes of the Learning Environment not
terminating. In order to address this, set `Max Steps` for the
Agents within the Scene Inspector to a value greater than 0. Alternatively, it
is possible to manually set `done` conditions for episodes from within scripts
for custom episode-terminating events.
terminating. In order to address this, set `Max Steps` for the Agents within the
Scene Inspector to a value greater than 0. Alternatively, it is possible to
manually set `done` conditions for episodes from within scripts for custom
episode-terminating events.

120
docs/Installation-Anaconda-Windows.md


# Installing ML-Agents Toolkit for Windows (Deprecated)
:warning: **Note:** We no longer use this guide ourselves and so it may not work correctly. We've
decided to keep it up just in case it is helpful to you.
:warning: **Note:** We no longer use this guide ourselves and so it may not work
correctly. We've decided to keep it up just in case it is helpful to you.
The ML-Agents toolkit supports Windows 10. While it might be possible to run the
ML-Agents toolkit using other versions of Windows, it has not been tested on

[Download](https://www.anaconda.com/download/#windows) and install Anaconda for
Windows. By using Anaconda, you can manage separate environments for different
distributions of Python. Python 3.6.1 or higher is required as we no longer support
Python 2. In this guide, we are using Python version 3.6 and Anaconda version
5.1
distributions of Python. Python 3.6.1 or higher is required as we no longer
support Python 2. In this guide, we are using Python version 3.6 and Anaconda
version 5.1
([64-bit](https://repo.continuum.io/archive/Anaconda3-5.1.0-Windows-x86_64.exe)
or [32-bit](https://repo.continuum.io/archive/Anaconda3-5.1.0-Windows-x86.exe)
direct links).

<img src="images/anaconda_default.PNG" alt="Anaconda Install" width="500" border="10" />
</p>
After installation, you must open __Anaconda Navigator__ to finish the setup.
After installation, you must open **Anaconda Navigator** to finish the setup.
From the Windows search bar, type _anaconda navigator_. You can close Anaconda
Navigator after it opens.

Type `environment variables` in the search bar (this can be reached by hitting
the Windows key or the bottom left Windows button). You should see an option
called __Edit the system environment variables__.
called **Edit the system environment variables**.
<p align="center">
<img src="images/edit_env_var.png"

From here, click the __Environment Variables__ button. Double click "Path" under
__System variable__ to edit the "Path" variable, click __New__ to add the
From here, click the **Environment Variables** button. Double click "Path" under
**System variable** to edit the "Path" variable, click **New** to add the
following new paths.
```console

install these Python dependencies.
If you haven't already, clone the ML-Agents Toolkit Github repository to your
local computer. You can do this using Git ([download
here](https://git-scm.com/download/win)) and running the following commands in
an Anaconda Prompt _(if you open a new prompt, be sure to activate the ml-agents
Conda environment by typing `activate ml-agents`)_:
local computer. You can do this using Git
([download here](https://git-scm.com/download/win)) and running the following
commands in an Anaconda Prompt _(if you open a new prompt, be sure to activate
the ml-agents Conda environment by typing `activate ml-agents`)_:
The `--branch latest_release` option will switch to the tag of the latest stable release.
Omitting that will get the `master` branch which is potentially unstable.
The `--branch latest_release` option will switch to the tag of the latest stable
release. Omitting that will get the `master` branch which is potentially
unstable.
The `com.unity.ml-agents` subdirectory contains the core code to add to your projects.
The `Project` subdirectory contains many [example environments](Learning-Environment-Examples.md)
to help you get started.
The `com.unity.ml-agents` subdirectory contains the core code to add to your
projects. The `Project` subdirectory contains many
[example environments](Learning-Environment-Examples.md) to help you get
started.
The `ml-agents` subdirectory contains a Python package which provides deep reinforcement
learning trainers to use with Unity environments.
The `ml-agents` subdirectory contains a Python package which provides deep
reinforcement learning trainers to use with Unity environments.
The `ml-agents-envs` subdirectory contains a Python API to interface with Unity, which
the `ml-agents` package depends on.
The `ml-agents-envs` subdirectory contains a Python API to interface with Unity,
which the `ml-agents` package depends on.
Keep in mind where the files were downloaded, as you will need the
trainer config files in this directory when running `mlagents-learn`.
Make sure you are connected to the Internet and then type in the Anaconda
Prompt:
Keep in mind where the files were downloaded, as you will need the trainer
config files in this directory when running `mlagents-learn`. Make sure you are
connected to the Internet and then type in the Anaconda Prompt:
```console
pip install mlagents

the ML-Agents toolkit.
Sometimes on Windows, when you use pip to install certain Python packages, the pip will get stuck when trying to read the cache of the package. If you see this, you can try:
Sometimes on Windows, when you use pip to install certain Python packages, the
pip will get stuck when trying to read the cache of the package. If you see
this, you can try:
```console
pip install mlagents --no-cache-dir

### Installing for Development
If you intend to make modifications to `ml-agents` or `ml-agents-envs`, you should install
the packages from the cloned repo rather than from PyPi. To do this, you will need to install
`ml-agents` and `ml-agents-envs` separately.
If you intend to make modifications to `ml-agents` or `ml-agents-envs`, you
should install the packages from the cloned repo rather than from PyPi. To do
this, you will need to install `ml-agents` and `ml-agents-envs` separately.
cloned or downloaded the files, from the Anaconda Prompt, change to the ml-agents
subdirectory inside the ml-agents directory:
cloned or downloaded the files, from the Anaconda Prompt, change to the
ml-agents subdirectory inside the ml-agents directory:
```console
cd C:\Downloads\ml-agents

pip install -e .
```
Running pip with the `-e` flag will let you make changes to the Python files directly and have those
reflected when you run `mlagents-learn`. It is important to install these packages in this order as the
`mlagents` package depends on `mlagents_envs`, and installing it in the other
order will download `mlagents_envs` from PyPi.
Running pip with the `-e` flag will let you make changes to the Python files
directly and have those reflected when you run `mlagents-learn`. It is important
to install these packages in this order as the `mlagents` package depends on
`mlagents_envs`, and installing it in the other order will download
`mlagents_envs` from PyPi.
## (Optional) Step 4: GPU Training using The ML-Agents Toolkit

Additionally, you will need to check if your GPU is CUDA compatible. Please
check Nvidia's page [here](https://developer.nvidia.com/cuda-gpus).
Currently for the ML-Agents toolkit, only CUDA v9.0 and cuDNN v7.0.5 is supported.
Currently for the ML-Agents toolkit, only CUDA v9.0 and cuDNN v7.0.5 is
supported.
### Install Nvidia CUDA toolkit

this guide, we are using version
[9.0.176](https://developer.nvidia.com/compute/cuda/9.0/Prod/network_installers/cuda_9.0.176_win10_network-exe)).
Before installing, please make sure you __close any running instances of Unity
or Visual Studio__.
Before installing, please make sure you **close any running instances of Unity
or Visual Studio**.
Run the installer and select the Express option. Note the directory where you
installed the CUDA toolkit. In this guide, we installed in the directory

</p>
Once you've signed up, go back to the cuDNN
[downloads page](https://developer.nvidia.com/cudnn).
You may or may not be asked to fill out a short survey. When you get to the list
cuDNN releases, __make sure you are downloading the right version for the CUDA
toolkit you installed in Step 1.__ In this guide, we are using version 7.0.5 for
CUDA toolkit version 9.0
[downloads page](https://developer.nvidia.com/cudnn). You may or may not be
asked to fill out a short survey. When you get to the list cuDNN releases,
**make sure you are downloading the right version for the CUDA toolkit you
installed in Step 1.** In this guide, we are using version 7.0.5 for CUDA
toolkit version 9.0
([direct link](https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.5/prod/9.0_20171129/cudnn-9.0-windows10-x64-v7)).
After you have downloaded the cuDNN files, you will need to extract the files

To set the environment variable, type `environment variables` in the search bar
(this can be reached by hitting the Windows key or the bottom left Windows
button). You should see an option called __Edit the system environment
variables__.
button). You should see an option called **Edit the system environment
variables**.
<p align="center">
<img src="images/edit_env_var.png"

From here, click the __Environment Variables__ button. Click __New__ to add a
new system variable _(make sure you do this under __System variables__ and not
From here, click the **Environment Variables** button. Click **New** to add a
new system variable _(make sure you do this under **System variables** and not
User variables_.
<p align="center">

</p>
For __Variable Name__, enter `CUDA_HOME`. For the variable value, put the
For **Variable Name**, enter `CUDA_HOME`. For the variable value, put the
is `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0`. Press __OK__ once.
is `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0`. Press **OK** once.
<p align="center">
<img src="images/system_variable_name_value.PNG"

To set the two path variables, inside the same __Environment Variables__ window
and under the second box called __System Variables__, find a variable called
`Path` and click __Edit__. You will add two directories to the list. For this
To set the two path variables, inside the same **Environment Variables** window
and under the second box called **System Variables**, find a variable called
`Path` and click **Edit**. You will add two directories to the list. For this
guide, the two entries would look like:
```console

Next, install `tensorflow-gpu` using `pip`. You'll need version 1.7.1. In an
Anaconda Prompt with the Conda environment ml-agents activated, type in the
following command to uninstall TensorFlow for cpu and install TensorFlow
for gpu _(make sure you are connected to the Internet)_:
following command to uninstall TensorFlow for cpu and install TensorFlow for gpu
_(make sure you are connected to the Internet)_:
```sh
pip uninstall tensorflow

Lastly, you should test to see if everything installed properly and that
TensorFlow can identify your GPU. In the same Anaconda Prompt, open Python
in the Prompt by calling:
TensorFlow can identify your GPU. In the same Anaconda Prompt, open Python in
the Prompt by calling:
```sh
python

642
docs/Learning-Environment-Examples.md


# Example Learning Environments
The Unity ML-Agents Toolkit includes an expanding set of example environments that highlight the
various features of the toolkit. These environments can also serve as templates for new environments
or as ways to test new ML algorithms. Environments are located in
`Project/Assets/ML-Agents/Examples` and summarized below. Additionally, our
The Unity ML-Agents Toolkit includes an expanding set of example environments
that highlight the various features of the toolkit. These environments can also
serve as templates for new environments or as ways to test new ML algorithms.
Environments are located in `Project/Assets/ML-Agents/Examples` and summarized
below. Additionally, our
For the environments that highlight specific features of the toolkit, we provide the
pre-trained model files and the training config file that enables you to train the scene
yourself. The environments that are designed to serve as challenges for researchers
do not have accompanying pre-trained model files or training configs and are marked as
_Optional_ below.
For the environments that highlight specific features of the toolkit, we provide
the pre-trained model files and the training config file that enables you to
train the scene yourself. The environments that are designed to serve as
challenges for researchers do not have accompanying pre-trained model files or
training configs and are marked as _Optional_ below.
[Making a New Learning Environment](Learning-Environment-Create-New.md) page.
If you would like to contribute environments, please see our
[Making a New Learning Environment](Learning-Environment-Create-New.md) page. If
you would like to contribute environments, please see our
[contribution guidelines](../com.unity.ml-agents/CONTRIBUTING.md) page.
## Basic

* Set-up: A linear movement task where the agent must move left or right to
- Set-up: A linear movement task where the agent must move left or right to
* Goal: Move to the most reward state.
* Agents: The environment contains one agent.
* Agent Reward Function:
* -0.01 at each step
* +0.1 for arriving at suboptimal state.
* +1.0 for arriving at optimal state.
* Behavior Parameters:
* Vector Observation space: One variable corresponding to current state.
* Vector Action space: (Discrete) Two possible actions (Move left, move
- Goal: Move to the most reward state.
- Agents: The environment contains one agent.
- Agent Reward Function:
- -0.01 at each step
- +0.1 for arriving at suboptimal state.
- +1.0 for arriving at optimal state.
- Behavior Parameters:
- Vector Observation space: One variable corresponding to current state.
- Vector Action space: (Discrete) Two possible actions (Move left, move
* Visual Observations: None
* Float Properties: None
* Benchmark Mean Reward: 0.93
- Visual Observations: None
- Float Properties: None
- Benchmark Mean Reward: 0.93
* Set-up: A balance-ball task, where the agent balances the ball on it's head.
* Goal: The agent must balance the ball on it's head for as long as possible.
* Agents: The environment contains 12 agents of the same kind, all using the
- Set-up: A balance-ball task, where the agent balances the ball on it's head.
- Goal: The agent must balance the ball on it's head for as long as possible.
- Agents: The environment contains 12 agents of the same kind, all using the
* Agent Reward Function:
* +0.1 for every step the ball remains on it's head.
* -1.0 if the ball falls off.
* Behavior Parameters:
* Vector Observation space: 8 variables corresponding to rotation of the agent cube,
and position and velocity of ball.
* Vector Observation space (Hard Version): 5 variables corresponding to
- Agent Reward Function:
- +0.1 for every step the ball remains on it's head.
- -1.0 if the ball falls off.
- Behavior Parameters:
- Vector Observation space: 8 variables corresponding to rotation of the agent
cube, and position and velocity of ball.
- Vector Observation space (Hard Version): 5 variables corresponding to
* Vector Action space: (Continuous) Size of 2, with one value corresponding to
- Vector Action space: (Continuous) Size of 2, with one value corresponding to
* Visual Observations: None.
* Float Properties: Three
* scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
* Default: 1
* Recommended Minimum: 0.2
* Recommended Maximum: 5
* gravity: Magnitude of gravity
* Default: 9.81
* Recommended Minimum: 4
* Recommended Maximum: 105
* mass: Specifies mass of the ball
* Default: 1
* Recommended Minimum: 0.1
* Recommended Maximum: 20
* Benchmark Mean Reward: 100
- Visual Observations: None.
- Float Properties: Three
- scale: Specifies the scale of the ball in the 3 dimensions (equal across the
three dimensions)
- Default: 1
- Recommended Minimum: 0.2
- Recommended Maximum: 5
- gravity: Magnitude of gravity
- Default: 9.81
- Recommended Minimum: 4
- Recommended Maximum: 105
- mass: Specifies mass of the ball
- Default: 1
- Recommended Minimum: 0.1
- Recommended Maximum: 20
- Benchmark Mean Reward: 100
* Set-up: A version of the classic grid-world task. Scene contains agent, goal,
- Set-up: A version of the classic grid-world task. Scene contains agent, goal,
* Goal: The agent must navigate the grid to the goal while avoiding the
- Goal: The agent must navigate the grid to the goal while avoiding the
* Agents: The environment contains nine agents with the same Behavior Parameters.
* Agent Reward Function:
* -0.01 for every step.
* +1.0 if the agent navigates to the goal position of the grid (episode ends).
* -1.0 if the agent navigates to an obstacle (episode ends).
* Behavior Parameters:
* Vector Observation space: None
* Vector Action space: (Discrete) Size of 4, corresponding to movement in
- Agents: The environment contains nine agents with the same Behavior
Parameters.
- Agent Reward Function:
- -0.01 for every step.
- +1.0 if the agent navigates to the goal position of the grid (episode ends).
- -1.0 if the agent navigates to an obstacle (episode ends).
- Behavior Parameters:
- Vector Observation space: None
- Vector Action space: (Discrete) Size of 4, corresponding to movement in
is turned on by default (this option can be toggled
using the `Mask Actions` checkbox within the `trueAgent` GameObject).
The trained model file provided was generated with action masking turned on.
* Visual Observations: One corresponding to top-down view of GridWorld.
* Float Properties: Three, corresponding to grid size, number of obstacles, and
is turned on by default (this option can be toggled using the `Mask Actions`
checkbox within the `trueAgent` GameObject). The trained model file provided
was generated with action masking turned on.
- Visual Observations: One corresponding to top-down view of GridWorld.
- Float Properties: Three, corresponding to grid size, number of obstacles, and
* Benchmark Mean Reward: 0.8
- Benchmark Mean Reward: 0.8
* Set-up: Two-player game where agents control rackets to hit a ball over the
- Set-up: Two-player game where agents control rackets to hit a ball over the
* Goal: The agents must hit the ball so that the opponent cannot hit a valid
return.
* Agents: The environment contains two agent with same Behavior Parameters.
After training you can set the `Behavior Type` to `Heuristic Only` on one of the Agent's
Behavior Parameters to play against your trained model.
* Agent Reward Function (independent):
* +1.0 To the agent that wins the point. An agent wins a point by preventing
the opponent from hitting a valid return.
* -1.0 To the agent who loses the point.
* Behavior Parameters:
* Vector Observation space: 9 variables corresponding to position, velocity
- Goal: The agents must hit the ball so that the opponent cannot hit a valid
return.
- Agents: The environment contains two agent with same Behavior Parameters.
After training you can set the `Behavior Type` to `Heuristic Only` on one of
the Agent's Behavior Parameters to play against your trained model.
- Agent Reward Function (independent):
- +1.0 To the agent that wins the point. An agent wins a point by preventing
the opponent from hitting a valid return.
- -1.0 To the agent who loses the point.
- Behavior Parameters:
- Vector Observation space: 9 variables corresponding to position, velocity
* Vector Action space: (Continuous) Size of 3, corresponding to movement
- Vector Action space: (Continuous) Size of 3, corresponding to movement
* Visual Observations: None
* Float Properties: Three
* gravity: Magnitude of gravity
* Default: 9.81
* Recommended Minimum: 6
* Recommended Maximum: 20
* scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
* Default: .5
* Recommended Minimum: 0.2
* Recommended Maximum: 5
- Visual Observations: None
- Float Properties: Three
- gravity: Magnitude of gravity
- Default: 9.81
- Recommended Minimum: 6
- Recommended Maximum: 20
- scale: Specifies the scale of the ball in the 3 dimensions (equal across the
three dimensions)
- Default: .5
- Recommended Minimum: 0.2
- Recommended Maximum: 5
* Set-up: A platforming environment where the agent can push a block around.
* Goal: The agent must push the block to the goal.
* Agents: The environment contains one agent.
* Agent Reward Function:
* -0.0025 for every step.
* +1.0 if the block touches the goal.
* Behavior Parameters:
* Vector Observation space: (Continuous) 70 variables corresponding to 14
- Set-up: A platforming environment where the agent can push a block around.
- Goal: The agent must push the block to the goal.
- Agents: The environment contains one agent.
- Agent Reward Function:
- -0.0025 for every step.
- +1.0 if the block touches the goal.
- Behavior Parameters:
- Vector Observation space: (Continuous) 70 variables corresponding to 14
* Vector Action space: (Discrete) Size of 6, corresponding to turn clockwise
- Vector Action space: (Discrete) Size of 6, corresponding to turn clockwise
* Visual Observations (Optional): One first-person camera. Use
`VisualPushBlock` scene. __The visual observation version of
this environment does not train with the provided default
training parameters.__
* Float Properties: Four
* block_scale: Scale of the block along the x and z dimensions
* Default: 2
* Recommended Minimum: 0.5
* Recommended Maximum: 4
* dynamic_friction: Coefficient of friction for the ground material acting on moving objects
* Default: 0
* Recommended Minimum: 0
* Recommended Maximum: 1
* static_friction: Coefficient of friction for the ground material acting on stationary objects
* Default: 0
* Recommended Minimum: 0
* Recommended Maximum: 1
* block_drag: Effect of air resistance on block
* Default: 0.5
* Recommended Minimum: 0
* Recommended Maximum: 2000
* Benchmark Mean Reward: 4.5
- Visual Observations (Optional): One first-person camera. Use
`VisualPushBlock` scene. **The visual observation version of this
environment does not train with the provided default training parameters.**
- Float Properties: Four
- block_scale: Scale of the block along the x and z dimensions
- Default: 2
- Recommended Minimum: 0.5
- Recommended Maximum: 4
- dynamic_friction: Coefficient of friction for the ground material acting on
moving objects
- Default: 0
- Recommended Minimum: 0
- Recommended Maximum: 1
- static_friction: Coefficient of friction for the ground material acting on
stationary objects
- Default: 0
- Recommended Minimum: 0
- Recommended Maximum: 1
- block_drag: Effect of air resistance on block
- Default: 0.5
- Recommended Minimum: 0
- Recommended Maximum: 2000
- Benchmark Mean Reward: 4.5
* Set-up: A platforming environment where the agent can jump over a wall.
* Goal: The agent must use the block to scale the wall and reach the goal.
* Agents: The environment contains one agent linked to two different
Models. The Policy the agent is linked to changes depending on the
height of the wall. The change of Policy is done in the WallJumpAgent class.
* Agent Reward Function:
* -0.0005 for every step.
* +1.0 if the agent touches the goal.
* -1.0 if the agent falls off the platform.
* Behavior Parameters:
* Vector Observation space: Size of 74, corresponding to 14 ray casts each
- Set-up: A platforming environment where the agent can jump over a wall.
- Goal: The agent must use the block to scale the wall and reach the goal.
- Agents: The environment contains one agent linked to two different Models. The
Policy the agent is linked to changes depending on the height of the wall. The
change of Policy is done in the WallJumpAgent class.
- Agent Reward Function:
- -0.0005 for every step.
- +1.0 if the agent touches the goal.
- -1.0 if the agent falls off the platform.
- Behavior Parameters:
- Vector Observation space: Size of 74, corresponding to 14 ray casts each
* Vector Action space: (Discrete) 4 Branches:
* Forward Motion (3 possible actions: Forward, Backwards, No Action)
* Rotation (3 possible actions: Rotate Left, Rotate Right, No Action)
* Side Motion (3 possible actions: Left, Right, No Action)
* Jump (2 possible actions: Jump, No Action)
* Visual Observations: None
* Float Properties: Four
* Benchmark Mean Reward (Big & Small Wall): 0.8
- Vector Action space: (Discrete) 4 Branches:
- Forward Motion (3 possible actions: Forward, Backwards, No Action)
- Rotation (3 possible actions: Rotate Left, Rotate Right, No Action)
- Side Motion (3 possible actions: Left, Right, No Action)
- Jump (2 possible actions: Jump, No Action)
- Visual Observations: None
- Float Properties: Four
- Benchmark Mean Reward (Big & Small Wall): 0.8
* Set-up: Double-jointed arm which can move to target locations.
* Goal: The agents must move its hand to the goal location, and keep it there.
* Agents: The environment contains 10 agent with same Behavior Parameters.
* Agent Reward Function (independent):
* +0.1 Each step agent's hand is in goal location.
* Behavior Parameters:
* Vector Observation space: 26 variables corresponding to position, rotation,
- Set-up: Double-jointed arm which can move to target locations.
- Goal: The agents must move its hand to the goal location, and keep it there.
- Agents: The environment contains 10 agent with same Behavior Parameters.
- Agent Reward Function (independent):
- +0.1 Each step agent's hand is in goal location.
- Behavior Parameters:
- Vector Observation space: 26 variables corresponding to position, rotation,
* Vector Action space: (Continuous) Size of 4, corresponding to torque
- Vector Action space: (Continuous) Size of 4, corresponding to torque
* Visual Observations: None.
* Float Properties: Five
* goal_size: radius of the goal zone
* Default: 5
* Recommended Minimum: 1
* Recommended Maximum: 10
* goal_speed: speed of the goal zone around the arm (in radians)
* Default: 1
* Recommended Minimum: 0.2
* Recommended Maximum: 4
* gravity
* Default: 9.81
* Recommended Minimum: 4
* Recommended Maximum: 20
* deviation: Magnitude of sinusoidal (cosine) deviation of the goal along the vertical dimension
* Default: 0
* Recommended Minimum: 0
* Recommended Maximum: 5
* deviation_freq: Frequency of the cosine deviation of the goal along the vertical dimension
* Default: 0
* Recommended Minimum: 0
* Recommended Maximum: 3
* Benchmark Mean Reward: 30
- Visual Observations: None.
- Float Properties: Five
- goal_size: radius of the goal zone
- Default: 5
- Recommended Minimum: 1
- Recommended Maximum: 10
- goal_speed: speed of the goal zone around the arm (in radians)
- Default: 1
- Recommended Minimum: 0.2
- Recommended Maximum: 4
- gravity
- Default: 9.81
- Recommended Minimum: 4
- Recommended Maximum: 20
- deviation: Magnitude of sinusoidal (cosine) deviation of the goal along the
vertical dimension
- Default: 0
- Recommended Minimum: 0
- Recommended Maximum: 5
- deviation_freq: Frequency of the cosine deviation of the goal along the
vertical dimension
- Default: 0
- Recommended Minimum: 0
- Recommended Maximum: 3
- Benchmark Mean Reward: 30
* Set-up: A creature with 4 arms and 4 forearms.
* Goal: The agents must move its body toward the goal direction without falling.
* `CrawlerStaticTarget` - Goal direction is always forward.
* `CrawlerDynamicTarget`- Goal direction is randomized.
* Agents: The environment contains 3 agent with same Behavior Parameters.
* Agent Reward Function (independent):
* +0.03 times body velocity in the goal direction.
* +0.01 times body direction alignment with goal direction.
* Behavior Parameters:
* Vector Observation space: 117 variables corresponding to position, rotation,
- Set-up: A creature with 4 arms and 4 forearms.
- Goal: The agents must move its body toward the goal direction without falling.
- `CrawlerStaticTarget` - Goal direction is always forward.
- `CrawlerDynamicTarget`- Goal direction is randomized.
- Agents: The environment contains 3 agent with same Behavior Parameters.
- Agent Reward Function (independent):
- +0.03 times body velocity in the goal direction.
- +0.01 times body direction alignment with goal direction.
- Behavior Parameters:
- Vector Observation space: 117 variables corresponding to position, rotation,
* Vector Action space: (Continuous) Size of 20, corresponding to target
- Vector Action space: (Continuous) Size of 20, corresponding to target
* Visual Observations: None
* Float Properties: None
* Benchmark Mean Reward for `CrawlerStaticTarget`: 2000
* Benchmark Mean Reward for `CrawlerDynamicTarget`: 400
- Visual Observations: None
- Float Properties: None
- Benchmark Mean Reward for `CrawlerStaticTarget`: 2000
- Benchmark Mean Reward for `CrawlerDynamicTarget`: 400
* Set-up: A multi-agent environment where agents compete to collect food.
* Goal: The agents must learn to collect as many green food spheres as possible
- Set-up: A multi-agent environment where agents compete to collect food.
- Goal: The agents must learn to collect as many green food spheres as possible
* Agents: The environment contains 5 agents with same Behavior Parameters.
* Agent Reward Function (independent):
* +1 for interaction with green spheres
* -1 for interaction with red spheres
* Behavior Parameters:
* Vector Observation space: 53 corresponding to velocity of agent (2), whether
- Agents: The environment contains 5 agents with same Behavior Parameters.
- Agent Reward Function (independent):
- +1 for interaction with green spheres
- -1 for interaction with red spheres
- Behavior Parameters:
- Vector Observation space: 53 corresponding to velocity of agent (2), whether
* Vector Action space: (Discrete) 4 Branches:
* Forward Motion (3 possible actions: Forward, Backwards, No Action)
* Side Motion (3 possible actions: Left, Right, No Action)
* Rotation (3 possible actions: Rotate Left, Rotate Right, No Action)
* Laser (2 possible actions: Laser, No Action)
* Visual Observations (Optional): First-person camera per-agent. Use
`VisualFoodCollector` scene. __The visual observation version of
this environment does not train with the provided default
training parameters.__
* Float Properties: Two
* laser_length: Length of the laser used by the agent
* Default: 1
* Recommended Minimum: 0.2
* Recommended Maximum: 7
* agent_scale: Specifies the scale of the agent in the 3 dimensions (equal across the three dimensions)
* Default: 1
* Recommended Minimum: 0.5
* Recommended Maximum: 5
* Benchmark Mean Reward: 10
- Vector Action space: (Discrete) 4 Branches:
- Forward Motion (3 possible actions: Forward, Backwards, No Action)
- Side Motion (3 possible actions: Left, Right, No Action)
- Rotation (3 possible actions: Rotate Left, Rotate Right, No Action)
- Laser (2 possible actions: Laser, No Action)
- Visual Observations (Optional): First-person camera per-agent. Use
`VisualFoodCollector` scene. **The visual observation version of this
environment does not train with the provided default training parameters.**
- Float Properties: Two
- laser_length: Length of the laser used by the agent
- Default: 1
- Recommended Minimum: 0.2
- Recommended Maximum: 7
- agent_scale: Specifies the scale of the agent in the 3 dimensions (equal
across the three dimensions)
- Default: 1
- Recommended Minimum: 0.5
- Recommended Maximum: 5
- Benchmark Mean Reward: 10
* Set-up: Environment where the agent needs to find information in a room,
- Set-up: Environment where the agent needs to find information in a room,
* Goal: Move to the goal which corresponds to the color of the block in the
- Goal: Move to the goal which corresponds to the color of the block in the
* Agents: The environment contains one agent.
* Agent Reward Function (independent):
* +1 For moving to correct goal.
* -0.1 For moving to incorrect goal.
* -0.0003 Existential penalty.
* Behavior Parameters:
* Vector Observation space: 30 corresponding to local ray-casts detecting
- Agents: The environment contains one agent.
- Agent Reward Function (independent):
- +1 For moving to correct goal.
- -0.1 For moving to incorrect goal.
- -0.0003 Existential penalty.
- Behavior Parameters:
- Vector Observation space: 30 corresponding to local ray-casts detecting
* Vector Action space: (Discrete) 1 Branch, 4 actions corresponding to agent
- Vector Action space: (Discrete) 1 Branch, 4 actions corresponding to agent
* Visual Observations (Optional): First-person view for the agent. Use
`VisualHallway` scene. __The visual observation version of
this environment does not train with the provided default
training parameters.__
* Float Properties: None
* Benchmark Mean Reward: 0.7
* To speed up training, you can enable curiosity by adding the `curiosity` reward signal in `config/trainer_config.yaml`
- Visual Observations (Optional): First-person view for the agent. Use
`VisualHallway` scene. **The visual observation version of this environment
does not train with the provided default training parameters.**
- Float Properties: None
- Benchmark Mean Reward: 0.7
- To speed up training, you can enable curiosity by adding the `curiosity`
reward signal in `config/trainer_config.yaml`
* Set-up: Environment where the agent needs on-demand decision making. The agent
- Set-up: Environment where the agent needs on-demand decision making. The agent
* Goal: Catch the floating green cube. Only has a limited number of jumps.
* Agents: The environment contains one agent.
* Agent Reward Function (independent):
* +1 For catching the green cube.
* -1 For bouncing out of bounds.
* -0.05 Times the action squared. Energy expenditure penalty.
* Behavior Parameters:
* Vector Observation space: 6 corresponding to local position of agent and
- Goal: Catch the floating green cube. Only has a limited number of jumps.
- Agents: The environment contains one agent.
- Agent Reward Function (independent):
- +1 For catching the green cube.
- -1 For bouncing out of bounds.
- -0.05 Times the action squared. Energy expenditure penalty.
- Behavior Parameters:
- Vector Observation space: 6 corresponding to local position of agent and
* Vector Action space: (Continuous) 3 corresponding to agent force applied for
- Vector Action space: (Continuous) 3 corresponding to agent force applied for
* Visual Observations: None
* Float Properties: Two
* target_scale: The scale of the green cube in the 3 dimensions
* Default: 150
* Recommended Minimum: 50
* Recommended Maximum: 250
* Benchmark Mean Reward: 10
- Visual Observations: None
- Float Properties: Two
- target_scale: The scale of the green cube in the 3 dimensions
- Default: 150
- Recommended Minimum: 50
- Recommended Maximum: 250
- Benchmark Mean Reward: 10
* Set-up: Environment where four agents compete in a 2 vs 2 toy soccer game.
* Goal:
* Get the ball into the opponent's goal while preventing
the ball from entering own goal.
* Agents: The environment contains four agents, with the same
Behavior Parameters : Soccer.
* Agent Reward Function (dependent):
* +1 When ball enters opponent's goal.
* -1 When ball enters team's goal.
* -0.001 Existential penalty.
* Behavior Parameters:
* Vector Observation space: 336 corresponding to 11 ray-casts forward distributed over 120 degrees (264)
and 3 ray-casts backward distributed over 90 degrees each detecting 6 possible object types, along with the object's distance.
The forward ray-casts contribute 264 state dimensions and backward 72 state dimensions.
* Vector Action space: (Discrete) Three branched actions corresponding to forward, backward, sideways movement,
as well as rotation.
* Visual Observations: None
* Float Properties: Two
* ball_scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
* Default: 7.5
* Recommended minimum: 4
* Recommended maximum: 10
* gravity: Magnitude of the gravity
* Default: 9.81
* Recommended minimum: 6
* Recommended maximum: 20
- Set-up: Environment where four agents compete in a 2 vs 2 toy soccer game.
- Goal:
- Get the ball into the opponent's goal while preventing the ball from
entering own goal.
- Agents: The environment contains four agents, with the same Behavior
Parameters : Soccer.
- Agent Reward Function (dependent):
- +1 When ball enters opponent's goal.
- -1 When ball enters team's goal.
- -0.001 Existential penalty.
- Behavior Parameters:
- Vector Observation space: 336 corresponding to 11 ray-casts forward
distributed over 120 degrees (264) and 3 ray-casts backward distributed over
90 degrees each detecting 6 possible object types, along with the object's
distance. The forward ray-casts contribute 264 state dimensions and backward
72 state dimensions.
- Vector Action space: (Discrete) Three branched actions corresponding to
forward, backward, sideways movement, as well as rotation.
- Visual Observations: None
- Float Properties: Two
- ball_scale: Specifies the scale of the ball in the 3 dimensions (equal
across the three dimensions)
- Default: 7.5
- Recommended minimum: 4
- Recommended maximum: 10
- gravity: Magnitude of the gravity
- Default: 9.81
- Recommended minimum: 6
- Recommended maximum: 20
* Set-up: Physics-based Humanoids agents with 26 degrees of freedom. These DOFs
- Set-up: Physics-based Humanoids agents with 26 degrees of freedom. These DOFs
* Goal: The agents must move its body toward the goal direction as quickly as
- Goal: The agents must move its body toward the goal direction as quickly as
* Agents: The environment contains 11 independent agents with same Behavior Parameters.
* Agent Reward Function (independent):
* +0.03 times body velocity in the goal direction.
* +0.01 times head y position.
* +0.01 times body direction alignment with goal direction.
* -0.01 times head velocity difference from body velocity.
* Behavior Parameters:
* Vector Observation space: 215 variables corresponding to position, rotation,
- Agents: The environment contains 11 independent agents with same Behavior
Parameters.
- Agent Reward Function (independent):
- +0.03 times body velocity in the goal direction.
- +0.01 times head y position.
- +0.01 times body direction alignment with goal direction.
- -0.01 times head velocity difference from body velocity.
- Behavior Parameters:
- Vector Observation space: 215 variables corresponding to position, rotation,
* Vector Action space: (Continuous) Size of 39, corresponding to target
- Vector Action space: (Continuous) Size of 39, corresponding to target
* Visual Observations: None
* Float Properties: Four
* gravity: Magnitude of gravity
* Default: 9.81
* Recommended Minimum:
* Recommended Maximum:
* hip_mass: Mass of the hip component of the walker
* Default: 15
* Recommended Minimum: 7
* Recommended Maximum: 28
* chest_mass: Mass of the chest component of the walker
* Default: 8
* Recommended Minimum: 3
* Recommended Maximum: 20
* spine_mass: Mass of the spine component of the walker
* Default: 10
* Recommended Minimum: 3
* Recommended Maximum: 20
* Benchmark Mean Reward: 1000
- Visual Observations: None
- Float Properties: Four
- gravity: Magnitude of gravity
- Default: 9.81
- Recommended Minimum:
- Recommended Maximum:
- hip_mass: Mass of the hip component of the walker
- Default: 15
- Recommended Minimum: 7
- Recommended Maximum: 28
- chest_mass: Mass of the chest component of the walker
- Default: 8
- Recommended Minimum: 3
- Recommended Maximum: 20
- spine_mass: Mass of the spine component of the walker
- Default: 10
- Recommended Minimum: 3
- Recommended Maximum: 20
- Benchmark Mean Reward: 1000
* Set-up: Environment where the agent needs to press a button to spawn a
- Set-up: Environment where the agent needs to press a button to spawn a
* Goal: Move to the golden brick on top of the spawned pyramid.
* Agents: The environment contains one agent.
* Agent Reward Function (independent):
* +2 For moving to golden brick (minus 0.001 per step).
* Behavior Parameters:
* Vector Observation space: 148 corresponding to local ray-casts detecting
- Goal: Move to the golden brick on top of the spawned pyramid.
- Agents: The environment contains one agent.
- Agent Reward Function (independent):
- +2 For moving to golden brick (minus 0.001 per step).
- Behavior Parameters:
- Vector Observation space: 148 corresponding to local ray-casts detecting
* Vector Action space: (Discrete) 4 corresponding to agent rotation and
- Vector Action space: (Discrete) 4 corresponding to agent rotation and
* Visual Observations (Optional): First-person camera per-agent. Us
`VisualPyramids` scene. __The visual observation version of
this environment does not train with the provided default
training parameters.__
* Float Properties: None
* Benchmark Mean Reward: 1.75
- Visual Observations (Optional): First-person camera per-agent. Us
`VisualPyramids` scene. **The visual observation version of this environment
does not train with the provided default training parameters.**
- Float Properties: None
- Benchmark Mean Reward: 1.75

54
docs/Learning-Environment-Executable.md


Editor to interact with an environment. Using an executable has some advantages
over using the Editor:
* You can exchange executable with other people without having to share your
- You can exchange executable with other people without having to share your
* You can put your executable on a remote machine for faster training.
* You can use `Headless` mode for faster training.
* You can keep using the Unity Editor for other tasks while the agents are
- You can put your executable on a remote machine for faster training.
- You can use `Headless` mode for faster training.
- You can keep using the Unity Editor for other tasks while the agents are
training.
## Building the 3DBall environment

Next, we want the set up scene to play correctly when the training process
launches our environment executable. This means:
* The environment application runs in the background.
* No dialogs require interaction.
* The correct scene loads automatically.
- The environment application runs in the background.
- No dialogs require interaction.
- The correct scene loads automatically.
* Ensure that **Run in Background** is Checked.
* Ensure that **Display Resolution Dialog** is set to Disabled.
- Ensure that **Run in Background** is Checked.
- Ensure that **Display Resolution Dialog** is set to Disabled.
* (optional) Select “Development Build” to [log debug
messages](https://docs.unity3d.com/Manual/LogFiles.html).
- (optional) Select “Development Build” to
[log debug messages](https://docs.unity3d.com/Manual/LogFiles.html).
* In the File dialog, navigate to your ML-Agents directory.
* Assign a file name and click **Save**.
* (For Windows)With Unity 2018.1, it will ask you to select a folder instead
- In the File dialog, navigate to your ML-Agents directory.
- Assign a file name and click **Save**.
- (For Windows)With Unity 2018.1, it will ask you to select a folder instead
subfolder's name as `env_name`. You cannot create builds in the Assets folder
subfolder's name as `env_name`. You cannot create builds in the Assets
folder
![Build Window](images/mlagents-BuildWindow.png)

1. Run
`mlagents-learn <trainer-config-file> --env=<env_name> --run-id=<run-identifier>`
Where:
* `<trainer-config-file>` is the file path of the trainer configuration yaml
* `<env_name>` is the name and path to the executable you exported from Unity
- `<trainer-config-file>` is the file path of the trainer configuration yaml
- `<env_name>` is the name and path to the executable you exported from Unity
* `<run-identifier>` is a string used to separate the results of different
- `<run-identifier>` is a string used to separate the results of different
training runs
For example, if you are training with a 3DBall executable you exported to the

```
You can press Ctrl+C to stop the training, and your trained model will be at
`models/<run-identifier>/<behavior_name>.nn`, which corresponds
to your model's latest checkpoint. (**Note:** There is a known bug on Windows
that causes the saving of the model to fail when you early terminate the
training, it's recommended to wait until Step has reached the max_steps
parameter you set in trainer_config.yaml.) You can now embed this trained model
into your Agent by following the steps below:
`models/<run-identifier>/<behavior_name>.nn`, which corresponds to your model's
latest checkpoint. (**Note:** There is a known bug on Windows that causes the
saving of the model to fail when you early terminate the training, it's
recommended to wait until Step has reached the max_steps parameter you set in
trainer_config.yaml.) You can now embed this trained model into your Agent by
following the steps below:
1. Drag the `<behavior_name>.nn` file from the Project window of
the Editor to the **Model** placeholder in the **Ball3DAgent**
inspector window.
1. Drag the `<behavior_name>.nn` file from the Project window of the Editor to
the **Model** placeholder in the **Ball3DAgent** inspector window.
1. Press the :arrow_forward: button at the top of the editor.

99
docs/Readme.md


## Installation & Set-up
* [Installation](Installation.md)
* [Using Virtual Environment](Using-Virtual-Environment.md)
- [Installation](Installation.md)
- [Using Virtual Environment](Using-Virtual-Environment.md)
* [Getting Started Guide](Getting-Started.md)
* [ML-Agents Toolkit Overview](ML-Agents-Overview.md)
* [Background: Unity](Background-Unity.md)
* [Background: Machine Learning](Background-Machine-Learning.md)
* [Background: TensorFlow](Background-TensorFlow.md)
* [Example Environments](Learning-Environment-Examples.md)
- [Getting Started Guide](Getting-Started.md)
- [ML-Agents Toolkit Overview](ML-Agents-Overview.md)
- [Background: Unity](Background-Unity.md)
- [Background: Machine Learning](Background-Machine-Learning.md)
- [Background: TensorFlow](Background-TensorFlow.md)
- [Example Environments](Learning-Environment-Examples.md)
* [Making a New Learning Environment](Learning-Environment-Create-New.md)
* [Designing a Learning Environment](Learning-Environment-Design.md)
* [Designing Agents](Learning-Environment-Design-Agents.md)
- [Making a New Learning Environment](Learning-Environment-Create-New.md)
- [Designing a Learning Environment](Learning-Environment-Design.md)
- [Designing Agents](Learning-Environment-Design-Agents.md)
* [Using the Monitor](Feature-Monitor.md)
* [Using an Executable Environment](Learning-Environment-Executable.md)
- [Using the Monitor](Feature-Monitor.md)
- [Using an Executable Environment](Learning-Environment-Executable.md)
* [Training ML-Agents](Training-ML-Agents.md)
* [Reward Signals](Reward-Signals.md)
* [Profiling Trainers](Profiling-Python.md)
* [Using TensorBoard to Observe Training](Using-Tensorboard.md)
* [Training Using Concurrent Unity Instances](Training-Using-Concurrent-Unity-Instances.md)
* [Training with Proximal Policy Optimization](Training-PPO.md)
* [Training with Soft Actor-Critic](Training-SAC.md)
* [Training with Self-Play](Training-Self-Play.md)
- [Training ML-Agents](Training-ML-Agents.md)
- [Reward Signals](Reward-Signals.md)
- [Profiling Trainers](Profiling-Python.md)
- [Using TensorBoard to Observe Training](Using-Tensorboard.md)
- [Training Using Concurrent Unity Instances](Training-Using-Concurrent-Unity-Instances.md)
- [Training with Proximal Policy Optimization](Training-PPO.md)
- [Training with Soft Actor-Critic](Training-SAC.md)
- [Training with Self-Play](Training-Self-Play.md)
* [Training with Curriculum Learning](Training-Curriculum-Learning.md)
* [Training with Imitation Learning](Training-Imitation-Learning.md)
* [Training with LSTM](Feature-Memory.md)
* [Training with Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md)
- [Training with Curriculum Learning](Training-Curriculum-Learning.md)
- [Training with Imitation Learning](Training-Imitation-Learning.md)
- [Training with LSTM](Feature-Memory.md)
- [Training with Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md)
* [Unity Inference Engine](Unity-Inference-Engine.md)
- [Unity Inference Engine](Unity-Inference-Engine.md)
* [Creating Custom Side Channels](Custom-SideChannels.md)
- [Creating Custom Side Channels](Custom-SideChannels.md)
* [Migrating from earlier versions of ML-Agents](Migrating.md)
* [Frequently Asked Questions](FAQ.md)
* [ML-Agents Glossary](Glossary.md)
* [Limitations](Limitations.md)
- [Migrating from earlier versions of ML-Agents](Migrating.md)
- [Frequently Asked Questions](FAQ.md)
- [ML-Agents Glossary](Glossary.md)
- [Limitations](Limitations.md)
* [API Reference](API-Reference.md)
* [How to use the Python API](Python-API.md)
* [Wrapping Learning Environment as a Gym (+Baselines/Dopamine Integration)](../gym-unity/README.md)
- [API Reference](API-Reference.md)
- [How to use the Python API](Python-API.md)
- [Wrapping Learning Environment as a Gym (+Baselines/Dopamine Integration)](../gym-unity/README.md)
To make the Unity ML-Agents toolkit accessible to the global research and
Unity developer communities, we're attempting to create and maintain
translations of our documentation. We've started with translating a subset
of the documentation to one language (Chinese), but we hope to continue
translating more pages and to other languages. Consequently,
we welcome any enhancements and improvements from the community.
To make the Unity ML-Agents toolkit accessible to the global research and Unity
developer communities, we're attempting to create and maintain translations of
our documentation. We've started with translating a subset of the documentation
to one language (Chinese), but we hope to continue translating more pages and to
other languages. Consequently, we welcome any enhancements and improvements from
the community.
* [Chinese](localized/zh-CN/)
* [Korean](localized/KR/)
- [Chinese](localized/zh-CN/)
- [Korean](localized/KR/)
We no longer use them ourselves and so they may not be up-to-date.
We've decided to keep them up just in case they are helpful to you.
* [Windows Anaconda Installation](Installation-Anaconda-Windows.md)
* [Using Docker](Using-Docker.md)
* [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md)
* [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md)
* [Using the Video Recorder](https://github.com/Unity-Technologies/video-recorder)
We no longer use them ourselves and so they may not be up-to-date. We've decided
to keep them up just in case they are helpful to you.
- [Windows Anaconda Installation](Installation-Anaconda-Windows.md)
- [Using Docker](Using-Docker.md)
- [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md)
- [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md)
- [Using the Video Recorder](https://github.com/Unity-Technologies/video-recorder)

147
docs/Training-on-Amazon-Web-Service.md


# Training on Amazon Web Service
:warning: **Note:** We no longer use this guide ourselves and so it may not work correctly. We've
decided to keep it up just in case it is helpful to you.
:warning: **Note:** We no longer use this guide ourselves and so it may not work
correctly. We've decided to keep it up just in case it is helpful to you.
This page contains instructions for setting up an EC2 instance on Amazon Web
Service for training ML-Agents environments.

We've prepared a pre-configured AMI for you with the ID: `ami-016ff5559334f8619` in the
`us-east-1` region. It was created as a modification of [Deep Learning AMI
(Ubuntu)](https://aws.amazon.com/marketplace/pp/B077GCH38C). The AMI has been
tested with p2.xlarge instance. Furthermore, if you want to train without
headless mode, you need to enable X Server.
We've prepared a pre-configured AMI for you with the ID: `ami-016ff5559334f8619`
in the `us-east-1` region. It was created as a modification of
[Deep Learning AMI (Ubuntu)](https://aws.amazon.com/marketplace/pp/B077GCH38C).
The AMI has been tested with p2.xlarge instance. Furthermore, if you want to
train without headless mode, you need to enable X Server.
After launching your EC2 instance using the ami and ssh into it, run the
following commands to enable it:

1. Activate the python3 environment
```sh
source activate python3
```
```sh
source activate python3
```
```sh
git clone --branch latest_release https://github.com/Unity-Technologies/ml-agents.git
cd ml-agents/ml-agents/
pip3 install -e .
```
```sh
git clone --branch latest_release https://github.com/Unity-Technologies/ml-agents.git
cd ml-agents/ml-agents/
pip3 install -e .
```
### Setting up X Server (optional)

#### Make sure there are no Xorg processes running:
```sh
# Kill any possible running Xorg processes
# Note that you might have to run this command multiple times depending on
# how Xorg is configured.
$ sudo killall Xorg
```sh
# Kill any possible running Xorg processes
# Note that you might have to run this command multiple times depending on
# how Xorg is configured.
$ sudo killall Xorg
# Check if there is any Xorg process left
# You will have a list of processes running on the GPU, Xorg should not be in
# the list, as shown below.
$ nvidia-smi
# Check if there is any Xorg process left
# You will have a list of processes running on the GPU, Xorg should not be in
# the list, as shown below.
$ nvidia-smi
# Thu Jun 14 20:21:11 2018
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 390.67 Driver Version: 390.67 |
# |-------------------------------+----------------------+----------------------+
# | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
# | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
# |===============================+======================+======================|
# | 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
# | N/A 37C P8 31W / 149W | 0MiB / 11441MiB | 0% Default |
# +-------------------------------+----------------------+----------------------+
#
# +-----------------------------------------------------------------------------+
# | Processes: GPU Memory |
# | GPU PID Type Process name Usage |
# |=============================================================================|
# | No running processes found |
# +-----------------------------------------------------------------------------+
# Thu Jun 14 20:21:11 2018
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 390.67 Driver Version: 390.67 |
# |-------------------------------+----------------------+----------------------+
# | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
# | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
# |===============================+======================+======================|
# | 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
# | N/A 37C P8 31W / 149W | 0MiB / 11441MiB | 0% Default |
# +-------------------------------+----------------------+----------------------+
#
# +-----------------------------------------------------------------------------+
# | Processes: GPU Memory |
# | GPU PID Type Process name Usage |
# |=============================================================================|
# | No running processes found |
# +-----------------------------------------------------------------------------+
```
```
#### Start X Server and make the ubuntu use X Server for display:

can use one of the example environments if you have not created your own).
2. Open the Build Settings window (menu: File > Build Settings).
3. Select Linux as the Target Platform, and x86_64 as the target architecture
(the default x86 currently does not work).
(the default x86 currently does not work).
Headless Mode, you have to setup the X Server to enable training.)
Headless Mode, you have to setup the X Server to enable training.)
```sh
chmod +x <your_env>.x86_64
```
```sh
chmod +x <your_env>.x86_64
```
```sh
# Start the X Server, press Enter to come back to the command line
$ sudo /usr/bin/X :0 &
```sh
# Start the X Server, press Enter to come back to the command line
$ sudo /usr/bin/X :0 &
# Check if Xorg process is running
# You will have a list of processes running on the GPU, Xorg should be in the list.
$ nvidia-smi
# Check if Xorg process is running
# You will have a list of processes running on the GPU, Xorg should be in the list.
$ nvidia-smi
# Make the ubuntu use X Server for display
$ export DISPLAY=:0
```
# Make the ubuntu use X Server for display
$ export DISPLAY=:0
```
```python
from mlagents_envs.environment import UnityEnvironment
```python
from mlagents_envs.environment import UnityEnvironment
env = UnityEnvironment(<your_env>)
```
env = UnityEnvironment(<your_env>)
```
Where `<your_env>` corresponds to the path to your environment executable.
Where `<your_env>` corresponds to the path to your environment executable.
You should receive a message confirming that the environment was loaded successfully.
You should receive a message confirming that the environment was loaded
successfully.
10. Train your models
```console

## FAQ
### The <Executable_Name>_Data folder hasn't been copied cover
### The <Executable_Name>\_Data folder hasn't been copied cover
If you've built your Linux executable, but forget to copy over the corresponding <Executable_Name>_Data folder, you will see error message like the following:
If you've built your Linux executable, but forget to copy over the corresponding
<Executable_Name>\_Data folder, you will see error message like the following:
```sh
Set current directory to /home/ubuntu/ml-agents/ml-agents

### Unity Environment not responding
If you didn't setup X Server or hasn't launched it properly, or your environment somehow crashes, or you haven't `chmod +x` your Unity Environment, all of these will cause connection between Unity and Python to fail. Then you will see something like this:
If you didn't setup X Server or hasn't launched it properly, or your environment
somehow crashes, or you haven't `chmod +x` your Unity Environment, all of these
will cause connection between Unity and Python to fail. Then you will see
something like this:
```console
Logging to /home/ubuntu/.config/unity3d/<Some_Path>/Player.log

The environment and the Python interface have compatible versions.
```
It would be also really helpful to check your /home/ubuntu/.config/unity3d/<Some_Path>/Player.log to see what happens with your Unity environment.
It would be also really helpful to check your
/home/ubuntu/.config/unity3d/<Some_Path>/Player.log to see what happens with
your Unity environment.
### Could not launch X Server

```sh
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
```
This means the NVIDIA's driver needs to be updated. Refer to [this section](Training-on-Amazon-Web-Service.md#update-and-setup-nvidia-driver) for more information.
This means the NVIDIA's driver needs to be updated. Refer to
[this section](Training-on-Amazon-Web-Service.md#update-and-setup-nvidia-driver)
for more information.

73
docs/Training-on-Microsoft-Azure.md


# Training on Microsoft Azure (works with ML-Agents toolkit v0.3)
:warning: **Note:** We no longer use this guide ourselves and so it may not work correctly. We've
decided to keep it up just in case it is helpful to you.
:warning: **Note:** We no longer use this guide ourselves and so it may not work
correctly. We've decided to keep it up just in case it is helpful to you.
This page contains instructions for setting up training on Microsoft Azure
through either

## Pre-Configured Azure Virtual Machine
A pre-configured virtual machine image is available in the Azure Marketplace and
is nearly completely ready for training. You can start by deploying the
is nearly completely ready for training. You can start by deploying the
training will, by default, run on the GPU. If you choose any other type of VM,
training will, by default, run on the GPU. If you choose any other type of VM,
Setting up your own instance requires a number of package installations. Please
view the documentation for doing so
[here](#custom-instances).
Setting up your own instance requires a number of package installations. Please
view the documentation for doing so [here](#custom-instances).
## Installing ML-Agents

To run your training on the VM:
1. [Move](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/copy-files-to-linux-vm-using-scp)
your built Unity application to your Virtual Machine.
2. Set the directory where the ML-Agents Toolkit was installed to your
working directory.
your built Unity application to your Virtual Machine.
2. Set the directory where the ML-Agents Toolkit was installed to your working
directory.
3. Run the following command:
```sh

## Monitoring your Training Run with TensorBoard
Once you have started training, you can [use TensorBoard to observe the
training](Using-Tensorboard.md).
Once you have started training, you can
[use TensorBoard to observe the training](Using-Tensorboard.md).
1. Start by [opening the appropriate port for web traffic to connect to your VM](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/nsg-quickstart-portal).
1. Start by
[opening the appropriate port for web traffic to connect to your VM](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/nsg-quickstart-portal).
* Note that you don't need to generate a new `Network Security Group` but
instead, go to the **Networking** tab under **Settings** for your VM.
* As an example, you could use the following settings to open the Port with
the following Inbound Rule settings:
* Source: Any
* Source Port Ranges: *
* Destination: Any
* Destination Port Ranges: 6006
* Protocol: Any
* Action: Allow
* Priority: (Leave as default)
- Note that you don't need to generate a new `Network Security Group` but
instead, go to the **Networking** tab under **Settings** for your VM.
- As an example, you could use the following settings to open the Port with
the following Inbound Rule settings:
- Source: Any
- Source Port Ranges: \*
- Destination: Any
- Destination Port Ranges: 6006
- Protocol: Any
- Action: Allow
- Priority: (Leave as default)
2. Unless you started the training as a background process, connect to your VM
from another terminal instance.

[Azure Container Instances](https://azure.microsoft.com/services/container-instances/)
allow you to spin up a container, on demand, that will run your training and
then be shut down. This ensures you aren't leaving a billable VM running when
it isn't needed. Using ACI enables you to offload training of your models without needing to
install Python and TensorFlow on your own computer.
then be shut down. This ensures you aren't leaving a billable VM running when it
isn't needed. Using ACI enables you to offload training of your models without
needing to install Python and TensorFlow on your own computer.
This page contains instructions for setting up a custom Virtual Machine on Microsoft Azure so you can running ML-Agents training in the cloud.
This page contains instructions for setting up a custom Virtual Machine on
Microsoft Azure so you can running ML-Agents training in the cloud.
with Ubuntu Linux (tests were done with 16.04 LTS). To use GPU support, use
a N-Series VM.
with Ubuntu Linux (tests were done with 16.04 LTS). To use GPU support, use a
N-Series VM.
2. SSH into your VM.
3. Start with the following commands to install the Nvidia driver:

sudo apt-get install cuda-8-0
```
5. You'll next need to download cuDNN from the Nvidia developer site. This
5. You'll next need to download cuDNN from the Nvidia developer site. This
7. Download (to your own computer) cuDNN from [this url](https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v6/prod/8.0_20170307/Ubuntu16_04_x64/libcudnn6_6.0.20-1+cuda8.0_amd64-deb).
7. Download (to your own computer) cuDNN from
[this url](https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v6/prod/8.0_20170307/Ubuntu16_04_x64/libcudnn6_6.0.20-1+cuda8.0_amd64-deb).
8. Copy the deb package to your VM:

sudo reboot
```
10. After a minute, you should be able to SSH back into your VM. After doing
so, run the following:
10. After a minute, you should be able to SSH back into your VM. After doing so,
run the following:
```sh
sudo apt install python-pip

11. At this point, you need to install TensorFlow. The version you install
11. At this point, you need to install TensorFlow. The version you install
should be tied to if you are using GPU to train:
```sh

pip3 install pillow
pip3 install numpy
```

34
docs/Using-Docker.md


# Using Docker For ML-Agents (Deprecated)
:warning: **Note:** We no longer use this guide ourselves and so it may not work correctly. We've
decided to keep it up just in case it is helpful to you.
:warning: **Note:** We no longer use this guide ourselves and so it may not work
correctly. We've decided to keep it up just in case it is helpful to you.
We currently offer a solution for Windows and Mac users who would like to do
training or inference using Docker. This option may be appealing to those who

## Requirements
- [Docker](https://www.docker.com)
- Unity _Linux Build Support_ Component. Make sure to select the _Linux
Build Support_ component when installing Unity.
- Unity _Linux Build Support_ Component. Make sure to select the _Linux Build
Support_ component when installing Unity.
<p align="center">
<img src="images/unity_linux_build_support.png"

Using Docker for ML-Agents involves three steps: building the Unity environment
with specific flags, building a Docker container and, finally, running the
container. If you are not familiar with building a Unity environment for
ML-Agents, please read through our [Getting Started with the 3D Balance Ball
Example](Getting-Started.md) guide first.
ML-Agents, please read through our
[Getting Started with the 3D Balance Ball Example](Getting-Started.md) guide
first.
### Build the Environment (Optional)

random name if this is not set. _Note that this must be unique for every run
of a Docker image._
- `<image-name>` references the image name used when building the container.
- `<environment-name>` __(Optional)__: If you are training with a linux
- `<environment-name>` **(Optional)**: If you are training with a linux
executable, this is the name of the executable. If you are training in the
Editor, do not pass a `<environment-name>` argument and press the
:arrow_forward: button in Unity when the message _"Start training by pressing

For more detail on Docker mounts, check out
[these](https://docs.docker.com/storage/bind-mounts/) docs from Docker.
**NOTE** If you are training using docker for environments that use visual observations, you may need to increase the default memory that Docker allocates for the container. For example, see [here](https://docs.docker.com/docker-for-mac/#advanced) for instructions for Docker for Mac.
**NOTE** If you are training using docker for environments that use visual
observations, you may need to increase the default memory that Docker allocates
for the container. For example, see
[here](https://docs.docker.com/docker-for-mac/#advanced) for instructions for
Docker for Mac.
You can run Tensorboard to monitor your training instance on http://localhost:6006:
You can run Tensorboard to monitor your training instance on
http://localhost:6006:
```sh
docker exec -it <container-name> tensorboard --logdir=/unity-volume/summaries --host=0.0.0.0

For more details on Tensorboard, check out the documentation about [Using Tensorboard](Using-Tensorboard.md).
For more details on Tensorboard, check out the documentation about
[Using Tensorboard](Using-Tensorboard.md).
### Stopping Container and Saving State

docker kill --signal=SIGINT <container-name>
```
`<container-name>` is the name of the container specified in the earlier `docker
run` command. If you didn't specify one, you can find the randomly generated
identifier by running `docker container ls`.
`<container-name>` is the name of the container specified in the earlier
`docker run` command. If you didn't specify one, you can find the randomly
generated identifier by running `docker container ls`.

73
docs/Using-Tensorboard.md


1. Open a terminal or console window:
1. Navigate to the directory where the ML-Agents Toolkit is installed.
1. From the command line run: `tensorboard --logdir=summaries --port=6006`
1. Open a browser window and navigate to [localhost:6006](http://localhost:6006).
1. Open a browser window and navigate to
[localhost:6006](http://localhost:6006).
**Note:** The default port TensorBoard uses is 6006. If there is an existing session
running on port 6006 a new session can be launched on an open port using the --port
option.
**Note:** The default port TensorBoard uses is 6006. If there is an existing
session running on port 6006 a new session can be launched on an open port using
the --port option.
**Note:** If you don't assign a `run-id` identifier, `mlagents-learn` uses the
default string, "ppo". All the statistics will be saved to the same sub-folder

### Environment Statistics
* `Environment/Lesson` - Plots the progress from lesson to lesson. Only interesting when
performing [curriculum training](Training-Curriculum-Learning.md).
- `Environment/Lesson` - Plots the progress from lesson to lesson. Only
interesting when performing
[curriculum training](Training-Curriculum-Learning.md).
* `Environment/Cumulative Reward` - The mean cumulative episode reward over all agents. Should
increase during a successful training session.
- `Environment/Cumulative Reward` - The mean cumulative episode reward over all
agents. Should increase during a successful training session.
* `Environment/Episode Length` - The mean length of each episode in the environment for all agents.
- `Environment/Episode Length` - The mean length of each episode in the
environment for all agents.
* `Policy/Entropy` (PPO; BC) - How random the decisions of the model are. Should slowly decrease
during a successful training process. If it decreases too quickly, the `beta`
hyperparameter should be increased.
- `Policy/Entropy` (PPO; BC) - How random the decisions of the model are. Should
slowly decrease during a successful training process. If it decreases too
quickly, the `beta` hyperparameter should be increased.
* `Policy/Learning Rate` (PPO; BC) - How large a step the training algorithm takes as it searches
for the optimal policy. Should decrease over time.
- `Policy/Learning Rate` (PPO; BC) - How large a step the training algorithm
takes as it searches for the optimal policy. Should decrease over time.
* `Policy/Value Estimate` (PPO) - The mean value estimate for all states visited by the agent. Should increase during a successful training session.
- `Policy/Value Estimate` (PPO) - The mean value estimate for all states visited
by the agent. Should increase during a successful training session.
* `Policy/Curiosity Reward` (PPO+Curiosity) - This corresponds to the mean cumulative intrinsic reward generated per-episode.
- `Policy/Curiosity Reward` (PPO+Curiosity) - This corresponds to the mean
cumulative intrinsic reward generated per-episode.
* `Losses/Policy Loss` (PPO) - The mean magnitude of policy loss function. Correlates to how
much the policy (process for deciding actions) is changing. The magnitude of
this should decrease during a successful training session.
- `Losses/Policy Loss` (PPO) - The mean magnitude of policy loss function.
Correlates to how much the policy (process for deciding actions) is changing.
The magnitude of this should decrease during a successful training session.
* `Losses/Value Loss` (PPO) - The mean loss of the value function update. Correlates to how
well the model is able to predict the value of each state. This should
increase while the agent is learning, and then decrease once the reward
stabilizes.
- `Losses/Value Loss` (PPO) - The mean loss of the value function update.
Correlates to how well the model is able to predict the value of each state.
This should increase while the agent is learning, and then decrease once the
reward stabilizes.
* `Losses/Forward Loss` (PPO+Curiosity) - The mean magnitude of the inverse model
loss function. Corresponds to how well the model is able to predict the new
observation encoding.
- `Losses/Forward Loss` (PPO+Curiosity) - The mean magnitude of the inverse
model loss function. Corresponds to how well the model is able to predict the
new observation encoding.
* `Losses/Inverse Loss` (PPO+Curiosity) - The mean magnitude of the forward model
loss function. Corresponds to how well the model is able to predict the action
taken between two observations.
- `Losses/Inverse Loss` (PPO+Curiosity) - The mean magnitude of the forward
model loss function. Corresponds to how well the model is able to predict the
action taken between two observations.
* `Losses/Cloning Loss` (BC) - The mean magnitude of the behavioral cloning loss. Corresponds to how well the model imitates the demonstration data.
- `Losses/Cloning Loss` (BC) - The mean magnitude of the behavioral cloning
loss. Corresponds to how well the model imitates the demonstration data.
## Custom Metrics from C#
To get custom metrics from a C# environment into Tensorboard, you can use the StatsSideChannel:
## Custom Metrics from Unity
To get custom metrics from a C# environment into Tensorboard, you can use the
StatsSideChannel:
```csharp
var statsSideChannel = SideChannelUtils.GetSideChannel<StatsSideChannel>();
statsSideChannel.AddStat("MyMetric", 1.0);

63
docs/Using-Virtual-Environment.md


# Using Virtual Environment
## What is a Virtual Environment?
A Virtual Environment is a self contained directory tree that contains a Python installation
for a particular version of Python, plus a number of additional packages. To learn more about
Virtual Environments see [here](https://docs.python.org/3/library/venv.html).
A Virtual Environment is a self contained directory tree that contains a Python
installation for a particular version of Python, plus a number of additional
packages. To learn more about Virtual Environments see
[here](https://docs.python.org/3/library/venv.html).
A Virtual Environment keeps all dependencies for the Python project separate from dependencies
of other projects. This has a few advantages:
A Virtual Environment keeps all dependencies for the Python project separate
from dependencies of other projects. This has a few advantages:
spinning up a new environment and verifying the compatibility of the code with the
different version.
spinning up a new environment and verifying the compatibility of the code
with the different version.
This guide has been tested with Python 3.6 and 3.7. Python 3.8 is not supported at this time.
This guide has been tested with Python 3.6 and 3.7. Python 3.8 is not supported
at this time.
1. Download the `get-pip.py` file using the command `curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py`
1. Download the `get-pip.py` file using the command
`curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py`
Note (for Ubuntu users): If the `ModuleNotFoundError: No module named 'distutils.util'` error is encountered, then
python3-distutils needs to be installed. Install python3-distutils using `sudo apt-get install python3-distutils`
Note (for Ubuntu users): If the
`ModuleNotFoundError: No module named 'distutils.util'` error is encountered,
then python3-distutils needs to be installed. Install python3-distutils using
`sudo apt-get install python3-distutils`
1. Create a folder where the virtual environments will reside `$ mkdir ~/python-envs`
1. To create a new environment named `sample-env` execute `$ python3 -m venv ~/python-envs/sample-env`
1. To activate the environment execute `$ source ~/python-envs/sample-env/bin/activate`
1. Create a folder where the virtual environments will reside
`$ mkdir ~/python-envs`
1. To create a new environment named `sample-env` execute
`$ python3 -m venv ~/python-envs/sample-env`
1. To activate the environment execute
`$ source ~/python-envs/sample-env/bin/activate`
1. Upgrade to the latest setuptools version using `$ pip3 install --upgrade setuptools`
1. To deactivate the environment execute `$ deactivate` (you can reactivate the environment
using the same `activate` command listed above)
1. Upgrade to the latest setuptools version using
`$ pip3 install --upgrade setuptools`
1. To deactivate the environment execute `$ deactivate` (you can reactivate the
environment using the same `activate` command listed above)
## Ubuntu Setup

## Windows Setup
1. Create a folder where the virtual environments will reside `md python-envs`
1. To create a new environment named `sample-env` execute `python -m venv python-envs\sample-env`
1. To create a new environment named `sample-env` execute
`python -m venv python-envs\sample-env`
1. To deactivate the environment execute `deactivate` (you can reactivate the environment
using the same `activate` command listed above)
1. To deactivate the environment execute `deactivate` (you can reactivate the
environment using the same `activate` command listed above)
* Verify that you are using Python 3.6 or Python 3.7. Launch a command prompt using `cmd` and
execute `python --version` to verify the version.
* Python3 installation may require admin privileges on Windows.
* This guide is for Windows 10 using a 64-bit architecture only.
- Verify that you are using Python 3.6 or Python 3.7. Launch a command prompt
using `cmd` and execute `python --version` to verify the version.
- Python3 installation may require admin privileges on Windows.
- This guide is for Windows 10 using a 64-bit architecture only.
正在加载...
取消
保存