浏览代码

Merge pull request #1083 from Unity-Technologies/develop-flat-code-restructure

ML-Agents Code Restructure
/develop-generalizationTraining-TrainerController
GitHub 6 年前
当前提交
3900ed66
共有 170 个文件被更改,包括 1720 次插入1506 次删除
  1. 49
      .gitignore
  2. 10
      Dockerfile
  3. 25
      docs/API-Reference.md
  4. 15
      docs/Background-Jupyter.md
  5. 146
      docs/Basic-Guide.md
  6. 113
      docs/FAQ.md
  7. 2
      docs/Feature-Memory.md
  8. 380
      docs/Getting-Started-with-Balance-Ball.md
  9. 68
      docs/Glossary.md
  10. 2
      docs/Installation-Windows.md
  11. 57
      docs/Installation.md
  12. 12
      docs/Learning-Environment-Create-New.md
  13. 2
      docs/Learning-Environment-Design-External-Internal-Brains.md
  14. 2
      docs/Learning-Environment-Examples.md
  15. 28
      docs/Learning-Environment-Executable.md
  16. 685
      docs/ML-Agents-Overview.md
  17. 138
      docs/Python-API.md
  18. 81
      docs/Readme.md
  19. 23
      docs/Training-Curriculum-Learning.md
  20. 4
      docs/Training-Imitation-Learning.md
  21. 34
      docs/Training-ML-Agents.md
  22. 2
      docs/Training-on-Amazon-Web-Service.md
  23. 6
      docs/Training-on-Microsoft-Azure.md
  24. 115
      docs/Using-Docker.md
  25. 6
      docs/Using-Tensorboard.md
  26. 8
      docs/dox-ml-agents.conf
  27. 16
      gym-unity/Readme.md
  28. 2
      gym-unity/gym_unity/envs/unity_env.py
  29. 2
      gym-unity/setup.py
  30. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityToExternalGrpc.cs.meta
  31. 10
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityToExternalGrpc.cs
  32. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityToExternal.cs.meta
  33. 19
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityToExternal.cs
  34. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityRlOutput.cs.meta
  35. 27
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityRlOutput.cs
  36. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityRlInput.cs.meta
  37. 41
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityRlInput.cs
  38. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityRlInitializationOutput.cs.meta
  39. 31
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityRlInitializationOutput.cs
  40. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityRlInitializationInput.cs.meta
  41. 14
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityRlInitializationInput.cs
  42. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityOutput.cs.meta
  43. 29
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityOutput.cs
  44. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityMessage.cs.meta
  45. 32
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityMessage.cs
  46. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityInput.cs.meta
  47. 27
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityInput.cs
  48. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/SpaceTypeProto.cs.meta
  49. 17
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/SpaceTypeProto.cs
  50. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/ResolutionProto.cs.meta
  51. 15
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/ResolutionProto.cs
  52. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/Header.cs.meta
  53. 14
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/Header.cs
  54. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/EnvironmentParametersProto.cs.meta
  55. 21
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/EnvironmentParametersProto.cs
  56. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/EngineConfigurationProto.cs.meta
  57. 19
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/EngineConfigurationProto.cs
  58. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/CommandProto.cs.meta
  59. 14
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/CommandProto.cs
  60. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/BrainTypeProto.cs.meta
  61. 18
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/BrainTypeProto.cs
  62. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/BrainParametersProto.cs.meta
  63. 36
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/BrainParametersProto.cs
  64. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/AgentInfoProto.cs.meta
  65. 24
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/AgentInfoProto.cs
  66. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/AgentActionProto.cs.meta
  67. 16
      MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects/AgentActionProto.cs
  68. 0
      MLAgentsSDK/Assets/ML-Agents/Scripts/ActionMasker.cs
  69. 2
      MLAgentsSDK/Assets/ML-Agents/Scripts/Academy.cs
  70. 4
      ml-agents/tests/trainers/test_curriculum.py
  71. 36
      ml-agents/tests/trainers/test_trainer_controller.py
  72. 5
      ml-agents/tests/mock_communicator.py
  73. 10
      ml-agents/mlagents/envs/communicator_objects/unity_to_external_pb2_grpc.py
  74. 18
      ml-agents/mlagents/envs/communicator_objects/unity_to_external_pb2.py
  75. 54
      ml-agents/mlagents/envs/communicator_objects/unity_rl_output_pb2.py
  76. 66
      ml-agents/mlagents/envs/communicator_objects/unity_rl_input_pb2.py
  77. 39
      ml-agents/mlagents/envs/communicator_objects/unity_rl_initialization_output_pb2.py
  78. 21
      ml-agents/mlagents/envs/communicator_objects/unity_rl_initialization_input_pb2.py
  79. 33
      ml-agents/mlagents/envs/communicator_objects/unity_output_pb2.py
  80. 39
      ml-agents/mlagents/envs/communicator_objects/unity_message_pb2.py
  81. 33
      ml-agents/mlagents/envs/communicator_objects/unity_input_pb2.py
  82. 25
      ml-agents/mlagents/envs/communicator_objects/space_type_proto_pb2.py
  83. 25
      ml-agents/mlagents/envs/communicator_objects/resolution_proto_pb2.py
  84. 23
      ml-agents/mlagents/envs/communicator_objects/header_pb2.py
  85. 36
      ml-agents/mlagents/envs/communicator_objects/environment_parameters_proto_pb2.py
  86. 31
      ml-agents/mlagents/envs/communicator_objects/engine_configuration_proto_pb2.py
  87. 23
      ml-agents/mlagents/envs/communicator_objects/command_proto_pb2.py
  88. 29
      ml-agents/mlagents/envs/communicator_objects/brain_type_proto_pb2.py
  89. 49
      ml-agents/mlagents/envs/communicator_objects/brain_parameters_proto_pb2.py
  90. 41
      ml-agents/mlagents/envs/communicator_objects/agent_info_proto_pb2.py
  91. 27
      ml-agents/mlagents/envs/communicator_objects/agent_action_proto_pb2.py
  92. 4
      ml-agents/mlagents/envs/socket_communicator.py
  93. 6
      ml-agents/mlagents/envs/rpc_communicator.py
  94. 4
      ml-agents/mlagents/envs/exception.py
  95. 4
      ml-agents/mlagents/envs/environment.py
  96. 4
      ml-agents/mlagents/envs/communicator.py
  97. 0
      ml-agents/mlagents/envs/brain.py
  98. 3
      ml-agents/mlagents/envs/__init__.py
  99. 21
      notebooks/getting-started.ipynb
  100. 16
      ml-agents/mlagents/trainers/trainer_controller.py

49
.gitignore


/unity-environment/[Ll]ibrary/
/unity-environment/[Tt]emp/
/unity-environment/[Oo]bj/
/unity-environment/[Bb]uild/
/unity-environment/[Bb]uilds/
/unity-environment/[Pp]ackages/
/unity-environment/[Uu]nity[Pp]ackage[Mm]anager/
/unity-environment/Assets/AssetStoreTools*
/unity-environment/Assets/Plugins*
/unity-environment/Assets/Gizmos*
/MLAgentsSDK/[Ll]ibrary/
/MLAgentsSDK/[Tt]emp/
/MLAgentsSDK/[Oo]bj/
/MLAgentsSDK/[Bb]uild/
/MLAgentsSDK/[Bb]uilds/
/MLAgentsSDK/[Pp]ackages/
/MLAgentsSDK/[Uu]nity[Pp]ackage[Mm]anager/
/MLAgentsSDK/Assets/AssetStoreTools*
/MLAgentsSDK/Assets/Plugins*
/MLAgentsSDK/Assets/Gizmos*
python/models
python/summaries
*unity-environment.log
*MLAgentsSDK.log
/unity-environment/.vs/
/MLAgentsSDK/.vs/
/unity-environmentExportedObj/
/unity-environment.consulo/
/MLAgentsSDKExportedObj/
/MLAgentsSDK.consulo/
*.csproj
*.unityproj
*.sln

*.pidb.meta
# Unity3D Generated File On Crash Reports
/unity-environment/sysinfo.txt
/MLAgentsSDK/sysinfo.txt
# Builds
*.apk

*.x86
# Tensorflow Sharp Files
/unity-environment/Assets/ML-Agents/Plugins/Android*
/unity-environment/Assets/ML-Agents/Plugins/iOS*
/unity-environment/Assets/ML-Agents/Plugins/Computer*
/unity-environment/Assets/ML-Agents/Plugins/System*
/MLAgentsSDK/Assets/ML-Agents/Plugins/Android*
/MLAgentsSDK/Assets/ML-Agents/Plugins/iOS*
/MLAgentsSDK/Assets/ML-Agents/Plugins/Computer*
/MLAgentsSDK/Assets/ML-Agents/Plugins/System*
# Generated doc folders
/docs/html

# pytest cache
*.pytest_cache/
# Ignore compiled protobuf files.
ml-agents-protobuf/cs
ml-agents-protobuf/python
ml-agents-protobuf/Grpc*
# Ignore PyPi build files.
dist/
build/

10
Dockerfile


RUN apt-get install -y xvfb
ADD python/requirements.txt .
COPY ml-agents/requirements.txt .
WORKDIR /execute
COPY python /execute/python
COPY README.md .
COPY ml-agents /ml-agents
WORKDIR /ml-agents
RUN pip install .
ENTRYPOINT ["python", "python/learn.py"]
ENTRYPOINT ["python", "mlagents/learn.py"]

25
docs/API-Reference.md


# API Reference
Our developer-facing C# classes (Academy, Agent, Decision and
Monitor) have been documented to be compatabile with
[Doxygen](http://www.stack.nl/~dimitri/doxygen/) for auto-generating HTML
Our developer-facing C# classes (Academy, Agent, Decision and Monitor) have been
documented to be compatabile with
[Doxygen](http://www.stack.nl/~dimitri/doxygen/) for auto-generating HTML
To generate the API reference,
[download Doxygen](http://www.stack.nl/~dimitri/doxygen/download.html) and run
the following command within the `docs/` directory:
To generate the API reference, [download
Doxygen](http://www.stack.nl/~dimitri/doxygen/download.html) and run the
following command within the `docs/` directory:
that includes the classes that have been properly formatted.
The generated HTML files will be placed
in the `html/` subdirectory. Open `index.html` within that subdirectory to
navigate to the API reference home. Note that `html/` is already included in
the repository's `.gitignore` file.
that includes the classes that have been properly formatted. The generated HTML
files will be placed in the `html/` subdirectory. Open `index.html` within that
subdirectory to navigate to the API reference home. Note that `html/` is already
included in the repository's `.gitignore` file.
In the near future, we aim to expand our documentation
to include all the Unity C# classes and Python API.
In the near future, we aim to expand our documentation to include all the Unity
C# classes and Python API.

15
docs/Background-Jupyter.md


# Background: Jupyter
[Jupyter](https://jupyter.org) is a fantastic tool for writing code with
embedded visualizations. We provide one such notebook, `python/notebooks/getting-started.ipynb`,
for testing the Python control interface to a Unity build. This notebook is
introduced in the
[Getting Started with the 3D Balance Ball Environment](Getting-Started-with-Balance-Ball.md)
[Jupyter](https://jupyter.org) is a fantastic tool for writing code with
embedded visualizations. We provide one such notebook,
`notebooks/getting-started.ipynb`, for testing the Python control
interface to a Unity build. This notebook is introduced in the [Getting Started
with the 3D Balance Ball Environment](Getting-Started-with-Balance-Ball.md)
in the _Jupyter/IPython Quick Start Guide_. To launch Jupyter, run in the command line:
in the _Jupyter/IPython Quick Start Guide_. To launch Jupyter, run in the
command line:
`jupyter notebook`
jupyter notebook
Then navigate to `localhost:8888` to access your notebooks.

146
docs/Basic-Guide.md


# Basic Guide
This guide will show you how to use a pretrained model in an example Unity environment, and show you how to train the model yourself.
This guide will show you how to use a pretrained model in an example Unity
environment, and show you how to train the model yourself.
If you are not familiar with the [Unity Engine](https://unity3d.com/unity),
we highly recommend the [Roll-a-ball tutorial](https://unity3d.com/learn/tutorials/s/roll-ball-tutorial) to learn all the basic concepts of Unity.
If you are not familiar with the [Unity Engine](https://unity3d.com/unity), we
highly recommend the [Roll-a-ball
tutorial](https://unity3d.com/learn/tutorials/s/roll-ball-tutorial) to learn all
the basic concepts of Unity.
In order to use the ML-Agents toolkit within Unity, you need to change some Unity settings first. Also [TensorFlowSharp plugin](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage) is needed for you to use pretrained model within Unity, which is based on the [TensorFlowSharp repo](https://github.com/migueldeicaza/TensorFlowSharp).
In order to use the ML-Agents toolkit within Unity, you need to change some
Unity settings first. Also [TensorFlowSharp
plugin](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage)
is needed for you to use pretrained model within Unity, which is based on the
[TensorFlowSharp repo](https://github.com/migueldeicaza/TensorFlowSharp).
3. Using the file dialog that opens, locate the `unity-environment` folder within the the ML-Agents toolkit project and click **Open**.
3. Using the file dialog that opens, locate the `MLAgentsSDK` folder
within the the ML-Agents toolkit project and click **Open**.
5. For **each** of the platforms you target
(**PC, Mac and Linux Standalone**, **iOS** or **Android**):
5. For **each** of the platforms you target (**PC, Mac and Linux Standalone**,
**iOS** or **Android**):
2. Select **Scripting Runtime Version** to
**Experimental (.NET 4.6 Equivalent or .NET 4.x Equivalent)**
3. In **Scripting Defined Symbols**, add the flag `ENABLE_TENSORFLOW`.
After typing in the flag name, press Enter.
2. Select **Scripting Runtime Version** to **Experimental (.NET 4.6
Equivalent or .NET 4.x Equivalent)**
3. In **Scripting Defined Symbols**, add the flag `ENABLE_TENSORFLOW`. After
typing in the flag name, press Enter.
[Download](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage) the TensorFlowSharp plugin. Then import it into Unity by double clicking the downloaded file. You can check if it was successfully imported by checking the TensorFlow files in the Project window under **Assets** > **ML-Agents** > **Plugins** > **Computer**.
[Download](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage)
the TensorFlowSharp plugin. Then import it into Unity by double clicking the
downloaded file. You can check if it was successfully imported by checking the
TensorFlow files in the Project window under **Assets** > **ML-Agents** >
**Plugins** > **Computer**.
**Note**: If you don't see anything under **Assets**, drag the `ml-agents/unity-environment/Assets/ML-Agents` folder under **Assets** within Project window.
**Note**: If you don't see anything under **Assets**, drag the
`MLAgentsSDK/Assets/ML-Agents` folder under **Assets** within
Project window.
1. In the **Project** window, go to `Assets/ML-Agents/Examples/3DBall` folder and open the `3DBall` scene file.
2. In the **Hierarchy** window, select the **Ball3DBrain** child under the **Ball3DAcademy** GameObject to view its properties in the Inspector window.
3. On the **Ball3DBrain** object's **Brain** component, change the **Brain Type** to **Internal**.
4. In the **Project** window, locate the `Assets/ML-Agents/Examples/3DBall/TFModels` folder.
5. Drag the `3DBall` model file from the `TFModels` folder to the **Graph Model** field of the **Ball3DBrain** object's **Brain** component.
5. Click the **Play** button and you will see the platforms balance the balls using the pretrained model.
1. In the **Project** window, go to `Assets/ML-Agents/Examples/3DBall` folder
and open the `3DBall` scene file.
2. In the **Hierarchy** window, select the **Ball3DBrain** child under the
**Ball3DAcademy** GameObject to view its properties in the Inspector window.
3. On the **Ball3DBrain** object's **Brain** component, change the **Brain
Type** to **Internal**.
4. In the **Project** window, locate the
`Assets/ML-Agents/Examples/3DBall/TFModels` folder.
5. Drag the `3DBall` model file from the `TFModels` folder to the **Graph
Model** field of the **Ball3DBrain** object's **Brain** component.
6. Click the **Play** button and you will see the platforms balance the balls
using the pretrained model.
The `python/Basics` [Jupyter notebook](Background-Jupyter.md) contains a
simple walkthrough of the functionality of the Python
API. It can also serve as a simple test that your environment is configured
correctly. Within `Basics`, be sure to set `env_name` to the name of the
Unity executable if you want to [use an executable](Learning-Environment-Executable.md) or to `None` if you want to interact with the current scene in the Unity Editor.
The `notebooks/getting-started.ipynb` [Jupyter notebook](Background-Jupyter.md)
contains a simple walkthrough of the functionality of the Python API. It can
also serve as a simple test that your environment is configured correctly.
Within `Basics`, be sure to set `env_name` to the name of the Unity executable
if you want to [use an executable](Learning-Environment-Executable.md) or to
`None` if you want to interact with the current scene in the Unity Editor.
Since we are going to build this environment to conduct training, we need to
set the brain used by the agents to **External**. This allows the agents to
Since we are going to build this environment to conduct training, we need to set
the brain used by the agents to **External**. This allows the agents to
1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy
object.
1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy
object.
2. Select its child object **Ball3DBrain**.
3. In the Inspector window, set **Brain Type** to **External**.

1. Open a command or terminal window.
2. Nagivate to the folder where you installed the ML-Agents toolkit.
3. Change to the `python` directory.
4. Run `python3 learn.py --run-id=<run-identifier> --train`
Where:
- `<run-identifier>` is a string used to separate the results of different training runs
- And the `--train` tells learn.py to run a training session (rather than inference)
5. When the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen, you can press the :arrow_forward: button in Unity to start training in the Editor.
1. Open a command or terminal window.
2. Nagivate to the folder where you installed the ML-Agents toolkit.
3. Run `mlagents-learn <trainer-config-path> --run-id=<run-identifier> --train`
Where:
- `<trainer-config-path>` is the relative or absolute filepath of the
trainer configuration. The defaults used by environments in the ML-Agents
SDK can be found in `config/trainer_config.yaml`.
- `<run-identifier>` is a string used to separate the results of different
training runs
- And the `--train` tells `mlagents-learn` to run a training session (rather
than inference)
5. When the message _"Start training by pressing the Play button in the Unity
Editor"_ is displayed on the screen, you can press the :arrow_forward: button
in Unity to start training in the Editor.
**Note**: Alternatively, you can use an executable rather than the Editor to perform training. Please refer to [this page](Learning-Environment-Executable.md) for instructions on how to build and use an executable.
**Note**: Alternatively, you can use an executable rather than the Editor to
perform training. Please refer to [this
page](Learning-Environment-Executable.md) for instructions on how to build and
use an executable.
**Note**: If you're using Anaconda, don't forget to activate the ml-agents environment first.
**Note**: If you're using Anaconda, don't forget to activate the ml-agents
environment first.
If the learn.py runs correctly and starts training, you should see something like this:
If `mlagents-learn` runs correctly and starts training, you should see something
like this:
You can press Ctrl+C to stop the training, and your trained model will be at `ml-agents/python/models/<run-identifier>/editor_<academy_name>_<run-identifier>.bytes` where `<academy_name>` is the name of the Academy GameObject in the current scene. This file corresponds to your model's latest checkpoint. You can now embed this trained model into your internal brain by following the steps below, which is similar to the steps described [above](#play-an-example-environment-using-pretrained-model).
1. Move your model file into
`unity-environment/Assets/ML-Agents/Examples/3DBall/TFModels/`.
You can press Ctrl+C to stop the training, and your trained model will be at
`models/<run-identifier>/editor_<academy_name>_<run-identifier>.bytes` where
`<academy_name>` is the name of the Academy GameObject in the current scene.
This file corresponds to your model's latest checkpoint. You can now embed this
trained model into your internal brain by following the steps below, which is
similar to the steps described
[above](#play-an-example-environment-using-pretrained-model).
1. Move your model file into
`MLAgentsSDK/Assets/ML-Agents/Examples/3DBall/TFModels/`.
5. Drag the `<env_name>_<run-identifier>.bytes` file from the Project window of the Editor
to the **Graph Model** placeholder in the **Ball3DBrain** inspector window.
5. Drag the `<env_name>_<run-identifier>.bytes` file from the Project window of
the Editor to the **Graph Model** placeholder in the **Ball3DBrain**
inspector window.
* For more information on the ML-Agents toolkit, in addition to helpful background, check out the [ML-Agents Toolkit Overview](ML-Agents-Overview.md) page.
* For a more detailed walk-through of our 3D Balance Ball environment, check out the [Getting Started](Getting-Started-with-Balance-Ball.md) page.
* For a "Hello World" introduction to creating your own learning environment, check out the [Making a New Learning Environment](Learning-Environment-Create-New.md) page.
* For a series of Youtube video tutorials, checkout the [Machine Learning Agents PlayList](https://www.youtube.com/playlist?list=PLX2vGYjWbI0R08eWQkO7nQkGiicHAX7IX) page.
- For more information on the ML-Agents toolkit, in addition to helpful
background, check out the [ML-Agents Toolkit Overview](ML-Agents-Overview.md)
page.
- For a more detailed walk-through of our 3D Balance Ball environment, check out
the [Getting Started](Getting-Started-with-Balance-Ball.md) page.
- For a "Hello World" introduction to creating your own learning environment,
check out the [Making a New Learning
Environment](Learning-Environment-Create-New.md) page.
- For a series of Youtube video tutorials, checkout the [Machine Learning Agents
PlayList](https://www.youtube.com/playlist?list=PLX2vGYjWbI0R08eWQkO7nQkGiicHAX7IX)
page.

113
docs/FAQ.md


# Frequently Asked Questions
### Scripting Runtime Environment not setup correctly
## Scripting Runtime Environment not setup correctly
If you haven't switched your scripting runtime version from .NET 3.5 to .NET 4.6 or .NET 4.x, you will see such error message:
If you haven't switched your scripting runtime version from .NET 3.5 to .NET 4.6
or .NET 4.x, you will see such error message:
This is because .NET 3.5 doesn't support method Clear() for StringBuilder, refer to [Setting Up The ML-Agents Toolkit Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
This is because .NET 3.5 doesn't support method Clear() for StringBuilder, refer
to [Setting Up The ML-Agents Toolkit Within
Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
### TensorFlowSharp flag not turned on.
## TensorFlowSharp flag not turned on
If you have already imported the TensorFlowSharp plugin, but havn't set ENABLE_TENSORFLOW flag for your scripting define symbols, you will see the following error message:
If you have already imported the TensorFlowSharp plugin, but havn't set
ENABLE_TENSORFLOW flag for your scripting define symbols, you will see the
following error message:
You need to install and enable the TensorFlowSharp plugin in order to use the internal brain.
You need to install and enable the TensorFlowSharp plugin in order to use the internal brain.
This error message occurs because the TensorFlowSharp plugin won't be usage without the ENABLE_TENSORFLOW flag, refer to [Setting Up The ML-Agents Toolkit Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
This error message occurs because the TensorFlowSharp plugin won't be usage
without the ENABLE_TENSORFLOW flag, refer to [Setting Up The ML-Agents Toolkit
Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
### Tensorflow epsilon placeholder error
## Tensorflow epsilon placeholder error
If you have a graph placeholder set in the internal Brain inspector that is not present in the TensorFlow graph, you will see some error like this:
If you have a graph placeholder set in the internal Brain inspector that is not
present in the TensorFlow graph, you will see some error like this:
UnityAgentsException: One of the Tensorflow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>.
UnityAgentsException: One of the Tensorflow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>.
Solution: Go to all of your Brain object, find `Graph placeholders` and change its `size` to 0 to remove the `epsilon` placeholder.
Solution: Go to all of your Brain object, find `Graph placeholders` and change
its `size` to 0 to remove the `epsilon` placeholder.
Similarly, if you have a graph scope set in the internal Brain inspector that is not correctly set, you will see some error like this:
Similarly, if you have a graph scope set in the internal Brain inspector that is
not correctly set, you will see some error like this:
Solution: Make sure your Graph Scope field matches the corresponding brain object name in your Hierachy Inspector when there is multiple brain.
Solution: Make sure your Graph Scope field matches the corresponding brain
object name in your Hierachy Inspector when there is multiple brain.
### Environment Permission Error
## Environment Permission Error
If you directly import your Unity environment without building it in the
editor, you might need to give it additional permissions to execute it.
If you directly import your Unity environment without building it in the
editor, you might need to give it additional permissions to execute it.
`chmod -R 755 *.app`
```shell
chmod -R 755 *.app
```
`chmod -R 755 *.x86_64`
```shell
chmod -R 755 *.x86_64
```
On Windows, you can find
On Windows, you can find
### Environment Connection Timeout
## Environment Connection Timeout
If you are able to launch the environment from `UnityEnvironment` but then
receive a timeout error, there may be a number of possible causes.
If you are able to launch the environment from `UnityEnvironment` but
then receive a timeout error, there may be a number of possible causes.
* _Cause_: There may be no Brains in your environment which are set
to `External`. In this case, the environment will not attempt to
communicate with python. _Solution_: Set the Brains(s) you wish to
externally control through the Python API to `External` from the
Unity Editor, and rebuild the environment.
* _Cause_: On OSX, the firewall may be preventing communication with
the environment. _Solution_: Add the built environment binary to the
list of exceptions on the firewall by following
[instructions](https://support.apple.com/en-us/HT201642).
* _Cause_: An error happened in the Unity Environment preventing
communication. _Solution_: Look into the
[log files](https://docs.unity3d.com/Manual/LogFiles.html)
generated by the Unity Environment to figure what error happened.
* _Cause_: There may be no Brains in your environment which are set to
`External`. In this case, the environment will not attempt to communicate
with python. _Solution_: Set the Brains(s) you wish to externally control
through the Python API to `External` from the Unity Editor, and rebuild the
environment.
* _Cause_: On OSX, the firewall may be preventing communication with the
environment. _Solution_: Add the built environment binary to the list of
exceptions on the firewall by following
[instructions](https://support.apple.com/en-us/HT201642).
* _Cause_: An error happened in the Unity Environment preventing communication.
_Solution_: Look into the [log
files](https://docs.unity3d.com/Manual/LogFiles.html) generated by the Unity
Environment to figure what error happened.
### Communication port {} still in use
## Communication port {} still in use
If you receive an exception `"Couldn't launch new environment because
communication port {} is still in use. "`, you can change the worker
number in the Python script when calling
If you receive an exception `"Couldn't launch new environment because
communication port {} is still in use. "`, you can change the worker number in
the Python script when calling
`UnityEnvironment(file_name=filename, worker_id=X)`
```python
UnityEnvironment(file_name=filename, worker_id=X)
```
### Mean reward : nan
## Mean reward : nan
If you receive a message `Mean reward : nan` when attempting to train a
model using PPO, this is due to the episodes of the learning environment
not terminating. In order to address this, set `Max Steps` for either
the Academy or Agents within the Scene Inspector to a value greater
than 0. Alternatively, it is possible to manually set `done` conditions
for episodes from within scripts for custom episode-terminating events.
If you receive a message `Mean reward : nan` when attempting to train a model
using PPO, this is due to the episodes of the learning environment not
terminating. In order to address this, set `Max Steps` for either the Academy or
Agents within the Scene Inspector to a value greater than 0. Alternatively, it
is possible to manually set `done` conditions for episodes from within scripts
for custom episode-terminating events.

2
docs/Feature-Memory.md


track of what is important to remember with [LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory).
## How to use
When configuring the trainer parameters in the `trainer_config.yaml`
When configuring the trainer parameters in the `config/trainer_config.yaml`
file, add the following parameters to the Brain you want to use.
```json

380
docs/Getting-Started-with-Balance-Ball.md


# Getting Started with the 3D Balance Ball Environment
This tutorial walks through the end-to-end process of opening a ML-Agents toolkit
example environment in Unity, building the Unity executable, training an agent
in it, and finally embedding the trained model into the Unity environment.
This tutorial walks through the end-to-end process of opening a ML-Agents
toolkit example environment in Unity, building the Unity executable, training an
agent in it, and finally embedding the trained model into the Unity environment.
The ML-Agents toolkit includes a number of [example environments](Learning-Environment-Examples.md)
which you can examine to help understand the different ways in which the ML-Agents toolkit
can be used. These environments can also serve as templates for new
environments or as ways to test new ML algorithms. After reading this tutorial,
you should be able to explore and build the example environments.
The ML-Agents toolkit includes a number of [example
environments](Learning-Environment-Examples.md) which you can examine to help
understand the different ways in which the ML-Agents toolkit can be used. These
environments can also serve as templates for new environments or as ways to test
new ML algorithms. After reading this tutorial, you should be able to explore
and build the example environments.
This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball contains
a number of platforms and balls (which are all copies of each other).
Each platform tries to keep its ball from falling by rotating either
horizontally or vertically. In this environment, a platform is an **agent**
that receives a reward for every step that it balances the ball. An agent is
also penalized with a negative reward for dropping the ball. The goal of the
training process is to have the platforms learn to never drop the ball.
This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball
contains a number of platforms and balls (which are all copies of each other).
Each platform tries to keep its ball from falling by rotating either
horizontally or vertically. In this environment, a platform is an **agent** that
receives a reward for every step that it balances the ball. An agent is also
penalized with a negative reward for dropping the ball. The goal of the training
process is to have the platforms learn to never drop the ball.
In order to install and set up the ML-Agents toolkit, the Python dependencies and Unity,
see the [installation instructions](Installation.md).
In order to install and set up the ML-Agents toolkit, the Python dependencies
and Unity, see the [installation instructions](Installation.md).
An agent is an autonomous actor that observes and interacts with an
_environment_. In the context of Unity, an environment is a scene containing
an Academy and one or more Brain and Agent objects, and, of course, the other
An agent is an autonomous actor that observes and interacts with an
_environment_. In the context of Unity, an environment is a scene containing an
Academy and one or more Brain and Agent objects, and, of course, the other
**Note:** In Unity, the base object of everything in a scene is the
_GameObject_. The GameObject is essentially a container for everything else,
including behaviors, graphics, physics, etc. To see the components that make
up a GameObject, select the GameObject in the Scene window, and open the
Inspector window. The Inspector shows every component on a GameObject.
The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several platforms. Each platform in the scene is an
independent agent, but they all share the same brain. 3D Balance Ball does this
**Note:** In Unity, the base object of everything in a scene is the
_GameObject_. The GameObject is essentially a container for everything else,
including behaviors, graphics, physics, etc. To see the components that make up
a GameObject, select the GameObject in the Scene window, and open the Inspector
window. The Inspector shows every component on a GameObject.
The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several platforms. Each platform in the scene is an
independent agent, but they all share the same brain. 3D Balance Ball does this
The Academy object for the scene is placed on the Ball3DAcademy GameObject.
When you look at an Academy component in the inspector, you can see several
properties that control how the environment works. For example, the
**Training** and **Inference Configuration** properties set the graphics and
timescale properties for the Unity application. The Academy uses the
**Training Configuration** during training and the **Inference Configuration**
when not training. (*Inference* means that the agent is using a trained model
or heuristics or direct control — in other words, whenever **not** training.)
Typically, you set low graphics quality and a high time scale for the
**Training configuration** and a high graphics quality and the timescale to
`1.0` for the **Inference Configuration** .
The Academy object for the scene is placed on the Ball3DAcademy GameObject. When
you look at an Academy component in the inspector, you can see several
properties that control how the environment works. For example, the **Training**
and **Inference Configuration** properties set the graphics and timescale
properties for the Unity application. The Academy uses the **Training
Configuration** during training and the **Inference Configuration** when not
training. (*Inference* means that the agent is using a trained model or
heuristics or direct control — in other words, whenever **not** training.)
Typically, you set low graphics quality and a high time scale for the **Training
configuration** and a high graphics quality and the timescale to `1.0` for the
**Inference Configuration** .
**Note:** if you want to observe the environment during training, you can
adjust the **Inference Configuration** settings to use a larger window and a
timescale closer to 1:1. Be sure to set these parameters back when training in
earnest; otherwise, training can take a very long time.
**Note:** if you want to observe the environment during training, you can adjust
the **Inference Configuration** settings to use a larger window and a timescale
closer to 1:1. Be sure to set these parameters back when training in earnest;
otherwise, training can take a very long time.
Another aspect of an environment to look at is the Academy implementation.
Since the base Academy class is abstract, you must always define a subclass.
There are three functions you can implement, though they are all optional:
Another aspect of an environment to look at is the Academy implementation. Since
the base Academy class is abstract, you must always define a subclass. There are
three functions you can implement, though they are all optional:
* Academy.AcademyStep() — Called at every simulation step before
Agent.AgentAction() (and after the agents collect their observations).
* Academy.AcademyReset() — Called when the Academy starts or restarts the
simulation (including the first time).
* Academy.AcademyStep() — Called at every simulation step before
Agent.AgentAction() (and after the agents collect their observations).
* Academy.AcademyReset() — Called when the Academy starts or restarts the
simulation (including the first time).
The 3D Balance Ball environment does not use these functions — each agent
resets itself when needed — but many environments do use these functions to
control the environment around the agents.
The 3D Balance Ball environment does not use these functions — each agent resets
itself when needed — but many environments do use these functions to control the
environment around the agents.
The Ball3DBrain GameObject in the scene, which contains a Brain component,
is a child of the Academy object. (All Brain objects in a scene must be
children of the Academy.) All the agents in the 3D Balance Ball environment
use the same Brain instance.
A Brain doesn't store any information about an agent,
it just routes the agent's collected observations to the decision making
process and returns the chosen action to the agent. Thus, all agents can share
the same brain, but act independently. The Brain settings tell you quite a bit
about how an agent works.
The Ball3DBrain GameObject in the scene, which contains a Brain component, is a
child of the Academy object. (All Brain objects in a scene must be children of
the Academy.) All the agents in the 3D Balance Ball environment use the same
Brain instance. A Brain doesn't store any information about an agent, it just
routes the agent's collected observations to the decision making process and
returns the chosen action to the agent. Thus, all agents can share the same
brain, but act independently. The Brain settings tell you quite a bit about how
an agent works.
The **Brain Type** determines how an agent makes its decisions. The
**External** and **Internal** types work together — use **External** when
training your agents; use **Internal** when using the trained model.
The **Heuristic** brain allows you to hand-code the agent's logic by extending
the Decision class. Finally, the **Player** brain lets you map keyboard
commands to actions, which can be useful when testing your agents and
environment. If none of these types of brains do what you need, you can
implement your own CoreBrain to create your own type.
The **Brain Type** determines how an agent makes its decisions. The **External**
and **Internal** types work together — use **External** when training your
agents; use **Internal** when using the trained model. The **Heuristic** brain
allows you to hand-code the agent's logic by extending the Decision class.
Finally, the **Player** brain lets you map keyboard commands to actions, which
can be useful when testing your agents and environment. If none of these types
of brains do what you need, you can implement your own CoreBrain to create your
own type.
In this tutorial, you will set the **Brain Type** to **External** for training;
In this tutorial, you will set the **Brain Type** to **External** for training;
**Vector Observation Space**
#### Vector Observation Space
Before making a decision, an agent collects its observation about its state
in the world. The vector observation is a vector of floating point numbers
which contain relevant information for the agent to make decisions.
Before making a decision, an agent collects its observation about its state in
the world. The vector observation is a vector of floating point numbers which
contain relevant information for the agent to make decisions.
The Brain instance used in the 3D Balance Ball example uses the **Continuous**
vector observation space with a **State Size** of 8. This means that the
feature vector containing the agent's observations contains eight elements:
the `x` and `z` components of the platform's rotation and the `x`, `y`, and `z`
components of the ball's relative position and velocity. (The observation
values are defined in the agent's `CollectObservations()` function.)
The Brain instance used in the 3D Balance Ball example uses the **Continuous**
vector observation space with a **State Size** of 8. This means that the feature
vector containing the agent's observations contains eight elements: the `x` and
`z` components of the platform's rotation and the `x`, `y`, and `z` components
of the ball's relative position and velocity. (The observation values are
defined in the agent's `CollectObservations()` function.)
**Vector Action Space**
#### Vector Action Space
An agent is given instructions from the brain in the form of *actions*.
ML-Agents toolkit classifies actions into two types: the **Continuous**
vector action space is a vector of numbers that can vary continuously. What
each element of the vector means is defined by the agent logic (the PPO
training process just learns what values are better given particular state
observations based on the rewards received when it tries different values).
For example, an element might represent a force or torque applied to a
`RigidBody` in the agent. The **Discrete** action vector space defines its
actions as tables. An action given to the agent is an array of indeces into
tables.
An agent is given instructions from the brain in the form of *actions*.
ML-Agents toolkit classifies actions into two types: the **Continuous** vector
action space is a vector of numbers that can vary continuously. What each
element of the vector means is defined by the agent logic (the PPO training
process just learns what values are better given particular state observations
based on the rewards received when it tries different values). For example, an
element might represent a force or torque applied to a `RigidBody` in the agent.
The **Discrete** action vector space defines its actions as tables. An action
given to the agent is an array of indeces into tables.
space.
You can try training with both settings to observe whether there is a
difference. (Set the `Vector Action Space Size` to 4 when using the discrete
space. You can try training with both settings to observe whether there is a
difference. (Set the `Vector Action Space Size` to 4 when using the discrete
The Agent is the actor that observes and takes actions in the environment.
In the 3D Balance Ball environment, the Agent components are placed on the
twelve Platform GameObjects. The base Agent object has a few properties that
affect its behavior:
The Agent is the actor that observes and takes actions in the environment. In
the 3D Balance Ball environment, the Agent components are placed on the twelve
Platform GameObjects. The base Agent object has a few properties that affect its
behavior:
* **Brain** — Every agent must have a Brain. The brain determines how an agent
makes decisions. All the agents in the 3D Balance Ball scene share the same
brain.
* **Brain** — Every agent must have a Brain. The brain determines how an agent
makes decisions. All the agents in the 3D Balance Ball scene share the same
brain.
observe its environment. 3D Balance Ball does not use camera observations.
* **Max Step** — Defines how many simulation steps can occur before the agent
decides it is done. In 3D Balance Ball, an agent restarts after 5000 steps.
observe its environment. 3D Balance Ball does not use camera observations.
* **Max Step** — Defines how many simulation steps can occur before the agent
decides it is done. In 3D Balance Ball, an agent restarts after 5000 steps.
3D Balance Ball sets this true so that the agent restarts after reaching the
**Max Step** count or after dropping the ball.
3D Balance Ball sets this true so that the agent restarts after reaching the
**Max Step** count or after dropping the ball.
Perhaps the more interesting aspect of an agent is the Agent subclass
implementation. When you create an agent, you must extend the base Agent class.
Perhaps the more interesting aspect of an agent is the Agent subclass
implementation. When you create an agent, you must extend the base Agent class.
* Agent.AgentReset() — Called when the Agent resets, including at the beginning
of a session. The Ball3DAgent class uses the reset function to reset the
platform and ball. The function randomizes the reset values so that the
training generalizes to more than a specific starting position and platform
attitude.
* Agent.CollectObservations() — Called every simulation step. Responsible for
collecting the agent's observations of the environment. Since the Brain
instance assigned to the agent is set to the continuous vector observation
space with a state size of 8, the `CollectObservations()` must call
`AddVectorObs` 8 times.
* Agent.AgentReset() — Called when the Agent resets, including at the beginning
of a session. The Ball3DAgent class uses the reset function to reset the
platform and ball. The function randomizes the reset values so that the
training generalizes to more than a specific starting position and platform
attitude.
* Agent.CollectObservations() — Called every simulation step. Responsible for
collecting the agent's observations of the environment. Since the Brain
instance assigned to the agent is set to the continuous vector observation
space with a state size of 8, the `CollectObservations()` must call
`AddVectorObs` 8 times.
by the brain. The Ball3DAgent example handles both the continuous and the
discrete action space types. There isn't actually much difference between the
two state types in this environment — both vector action spaces result in a
small change in platform rotation at each step. The `AgentAction()` function
assigns a reward to the agent; in this example, an agent receives a small
positive reward for each step it keeps the ball on the platform and a larger,
negative reward for dropping the ball. An agent is also marked as done when it
drops the ball so that it will reset with a new ball for the next simulation
step.
by the brain. The Ball3DAgent example handles both the continuous and the
discrete action space types. There isn't actually much difference between the
two state types in this environment — both vector action spaces result in a
small change in platform rotation at each step. The `AgentAction()` function
assigns a reward to the agent; in this example, an agent receives a small
positive reward for each step it keeps the ball on the platform and a larger,
negative reward for dropping the ball. An agent is also marked as done when it
drops the ball so that it will reset with a new ball for the next simulation
step.
## Training the Brain with Reinforcement Learning

In order to train an agent to correctly balance the ball, we will use a
Reinforcement Learning algorithm called Proximal Policy Optimization (PPO).
This is a method that has been shown to be safe, efficient, and more general
purpose than many other RL algorithms, as such we have chosen it as the
example algorithm for use with ML-Agents toolkit. For more information on PPO,
OpenAI has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/)
In order to train an agent to correctly balance the ball, we will use a
Reinforcement Learning algorithm called Proximal Policy Optimization (PPO). This
is a method that has been shown to be safe, efficient, and more general purpose
than many other RL algorithms, as such we have chosen it as the example
algorithm for use with ML-Agents toolkit. For more information on PPO, OpenAI
has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/)
To train the agents within the Ball Balance environment, we will be using the
python package. We have provided a convenient script called `mlagents-learn`
which accepts arguments used to configure both training and inference phases.
To train the agents within the Ball Balance environment, we will be using the python
package. We have provided a convenient Python wrapper script called `learn.py` which accepts arguments used to configure both training and inference phases.
We can use `run_id` to identify the experiment and create a folder where the model and summary statistics are stored. When using TensorBoard to observe the training statistics, it helps to set this to a sequential value
for each training run. In other words, "BalanceBall1" for the first run,
"BalanceBall2" or the second, and so on. If you don't, the summaries for
every training run are saved to the same directory and will all be included
on the same graph.
We can use `run_id` to identify the experiment and create a folder where the
model and summary statistics are stored. When using TensorBoard to observe the
training statistics, it helps to set this to a sequential value for each
training run. In other words, "BalanceBall1" for the first run, "BalanceBall2"
or the second, and so on. If you don't, the summaries for every training run are
saved to the same directory and will all be included on the same graph.
To summarize, go to your command line, enter the `ml-agents/python` directory and type:
To summarize, go to your command line, enter the `ml-agents` directory and type:
```
python3 learn.py --run-id=<run-identifier> --train
```shell
mlagents-learn config/trainer_config.yaml --run-id=<run-identifier> --train
When the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen, you can press the :arrow_forward: button in Unity to start training in the Editor.
**Note**: If you're using Anaconda, don't forget to activate the ml-agents environment first.
When the message _"Start training by pressing the Play button in the Unity
Editor"_ is displayed on the screen, you can press the :arrow_forward: button in
Unity to start training in the Editor.
The `--train` flag tells the ML-Agents toolkit to run in training mode.
**Note**: If you're using Anaconda, don't forget to activate the ml-agents
environment first.
**Note**: You can train using an executable rather than the Editor. To do so, follow the intructions in
[Using an Execuatble](Learning-Environment-Executable.md).
The `--train` flag tells the ML-Agents toolkit to run in training mode.
**Note**: You can train using an executable rather than the Editor. To do so,
follow the intructions in [Using an
Execuatble](Learning-Environment-Executable.md).
Once you start training using `learn.py` in the way described in the previous section, the `ml-agents/python` folder will
contain a `summaries` directory. In order to observe the training process
in more detail, you can use TensorBoard. From the command line navigate to `ml-agents/python` folder and run:
Once you start training using `mlagents-learn` in the way described in the previous
section, the `ml-agents` directory will contain a `summaries` directory. In
order to observe the training process in more detail, you can use TensorBoard.
From the command line run:
`tensorboard --logdir=summaries`
```shell
tensorboard --logdir=summaries
```
* Lesson - only interesting when performing
[curriculum training](Training-Curriculum-Learning.md).
This is not used in the 3D Balance Ball environment.
* Cumulative Reward - The mean cumulative episode reward over all agents.
Should increase during a successful training session.
* Entropy - How random the decisions of the model are. Should slowly decrease
during a successful training process. If it decreases too quickly, the `beta`
hyperparameter should be increased.
* Episode Length - The mean length of each episode in the environment for all
agents.
* Learning Rate - How large a step the training algorithm takes as it searches
for the optimal policy. Should decrease over time.
* Lesson - only interesting when performing [curriculum
training](Training-Curriculum-Learning.md). This is not used in the 3D Balance
Ball environment.
* Cumulative Reward - The mean cumulative episode reward over all agents. Should
increase during a successful training session.
* Entropy - How random the decisions of the model are. Should slowly decrease
during a successful training process. If it decreases too quickly, the `beta`
hyperparameter should be increased.
* Episode Length - The mean length of each episode in the environment for all
agents.
* Learning Rate - How large a step the training algorithm takes as it searches
for the optimal policy. Should decrease over time.
much the policy (process for deciding actions) is changing. The magnitude of
this should decrease during a successful training session.
* Value Estimate - The mean value estimate for all states visited by the agent.
Should increase during a successful training session.
much the policy (process for deciding actions) is changing. The magnitude of
this should decrease during a successful training session.
* Value Estimate - The mean value estimate for all states visited by the agent.
Should increase during a successful training session.
well the model is able to predict the value of each state. This should decrease
during a successful training session.
well the model is able to predict the value of each state. This should
decrease during a successful training session.
Once the training process completes, and the training process saves the model
(denoted by the `Saved Model` message) you can add it to the Unity project and
use it with agents having an **Internal** brain type.
**Note:** Do not just close the Unity Window once the `Saved Model` message appears. Either wait for the training process to close the window or press Ctrl+C at the command-line prompt. If you simply close the window manually, the .bytes file containing the trained model is not exported into the ml-agents folder.
Once the training process completes, and the training process saves the model
(denoted by the `Saved Model` message) you can add it to the Unity project and
use it with agents having an **Internal** brain type. **Note:** Do not just
close the Unity Window once the `Saved Model` message appears. Either wait for
the training process to close the window or press Ctrl+C at the command-line
prompt. If you simply close the window manually, the .bytes file containing the
trained model is not exported into the ml-agents folder.
Because TensorFlowSharp support is still experimental, it is disabled by
default. In order to enable it, you must follow these steps. Please note that
Because TensorFlowSharp support is still experimental, it is disabled by
default. In order to enable it, you must follow these steps. Please note that
To set up the TensorFlowSharp Support, follow [Setting up ML-Agents Toolkit within Unity](Basic-Guide.md#setting-up-ml-agents-within-unity) section.
of the Basic Guide page.
To set up the TensorFlowSharp Support, follow [Setting up ML-Agents Toolkit
within Unity](Basic-Guide.md#setting-up-ml-agents-within-unity) section. of the
Basic Guide page.
To embed the trained model into Unity, follow the later part of [Training the Brain with Reinforcement Learning](Basic-Guide.md#training-the-brain-with-reinforcement-learning) section of the Basic Buides page.
To embed the trained model into Unity, follow the later part of [Training the
Brain with Reinforcement
Learning](Basic-Guide.md#training-the-brain-with-reinforcement-learning) section
of the Basic Buides page.

68
docs/Glossary.md


# ML-Agents Toolkit Glossary
* **Academy** - Unity Component which controls timing, reset, and
training/inference settings of the environment.
* **Action** - The carrying-out of a decision on the part of an
agent within the environment.
* **Agent** - Unity Component which produces observations and
takes actions in the environment. Agents actions are determined
by decisions produced by a linked Brain.
* **Brain** - Unity Component which makes decisions for the agents
linked to it.
* **Decision** - The specification produced by a Brain for an action
to be carried out given an observation.
* **Editor** - The Unity Editor, which may include any pane
(e.g. Hierarchy, Scene, Inspector).
* **Environment** - The Unity scene which contains Agents, Academy,
and Brains.
* **FixedUpdate** - Unity method called each time the the game engine
is stepped. ML-Agents logic should be placed here.
* **Frame** - An instance of rendering the main camera for the
display. Corresponds to each `Update` call of the game engine.
* **Observation** - Partial information describing the state of the
environment available to a given agent. (e.g. Vector, Visual, Text)
* **Policy** - Function for producing decisions from observations.
* **Reward** - Signal provided at every step used to indicate
desirability of an agent’s action within the current state
of the environment.
* **State** - The underlying properties of the environment
(including all agents within it) at a given time.
* **Step** - Corresponds to each `FixedUpdate` call of the game engine.
Is the smallest atomic change to the state possible.
* **Update** - Unity function called each time a frame is rendered.
ML-Agents logic should not be placed here.
* **External Coordinator** - ML-Agents class responsible for
communication with outside processes (in this case, the Python API).
* **Trainer** - Python class which is responsible for training a given
external brain. Contains TensorFlow graph which makes decisions
for external brain.
* **Academy** - Unity Component which controls timing, reset, and
training/inference settings of the environment.
* **Action** - The carrying-out of a decision on the part of an agent within the
environment.
* **Agent** - Unity Component which produces observations and takes actions in
the environment. Agents actions are determined by decisions produced by a
linked Brain.
* **Brain** - Unity Component which makes decisions for the agents linked to it.
* **Decision** - The specification produced by a Brain for an action to be
carried out given an observation.
* **Editor** - The Unity Editor, which may include any pane (e.g. Hierarchy,
Scene, Inspector).
* **Environment** - The Unity scene which contains Agents, Academy, and Brains.
* **FixedUpdate** - Unity method called each time the the game engine is
stepped. ML-Agents logic should be placed here.
* **Frame** - An instance of rendering the main camera for the display.
Corresponds to each `Update` call of the game engine.
* **Observation** - Partial information describing the state of the environment
available to a given agent. (e.g. Vector, Visual, Text)
* **Policy** - Function for producing decisions from observations.
* **Reward** - Signal provided at every step used to indicate desirability of an
agent’s action within the current state of the environment.
* **State** - The underlying properties of the environment (including all agents
within it) at a given time.
* **Step** - Corresponds to each `FixedUpdate` call of the game engine. Is the
smallest atomic change to the state possible.
* **Update** - Unity function called each time a frame is rendered. ML-Agents
logic should not be placed here.
* **External Coordinator** - ML-Agents class responsible for communication with
outside processes (in this case, the Python API).
* **Trainer** - Python class which is responsible for training a given external
brain. Contains TensorFlow graph which makes decisions for external brain.

2
docs/Installation-Windows.md


In our example, the files are located in `C:\Downloads`. After you have either cloned or downloaded the files, from the Anaconda Prompt, change to the python directory inside the ml-agents directory:
```
cd C:\Downloads\ml-agents\python
cd C:\Downloads\ml-agents\ml-agents
```
Make sure you are connected to the internet and then type in the Anaconda Prompt:

57
docs/Installation.md


_Linux Build Support_ component when installing Unity.
<p align="center">
<img src="images/unity_linux_build_support.png"
alt="Linux Build Support"
<img src="images/unity_linux_build_support.png"
alt="Linux Build Support"
Once installed, you will want to clone the ML-Agents Toolkit GitHub repository.
Once installed, you will want to clone the ML-Agents Toolkit GitHub repository.
The `unity-environment` directory in this repository contains the Unity Assets
to add to your projects. The `python` directory contains the training code.
Both directories are located at the root of the repository.
The `MLAgentsSDK` directory in this repository contains the Unity Assets
to add to your projects. The `python` directory contains python packages
which provide trainers, a python API to interface with Unity, and a package
to interface with OpenAI Gym.
the dependencies listed in the [requirements file](../python/requirements.txt).
the dependencies listed in the [requirements file](../ml-agents/requirements.txt).
- [TensorFlow](Background-TensorFlow.md)
- [Jupyter](Background-Jupyter.md)
- [TensorFlow](Background-TensorFlow.md)
- [Jupyter](Background-Jupyter.md)
**NOTES**
### NOTES
- If you are using Anaconda and are having trouble with TensorFlow, please see the following [note](https://www.tensorflow.org/install/install_mac#installing_with_anaconda) on how to install TensorFlow in an Anaconda environment.
- If you are using Anaconda and are having trouble with TensorFlow, please see
the following
[note](https://www.tensorflow.org/install/install_mac#installing_with_anaconda)
on how to install TensorFlow in an Anaconda environment.
If you are a Windows user who is new to Python and TensorFlow, follow [this guide](Installation-Windows.md) to set up your Python environment.
If you are a Windows user who is new to Python and TensorFlow, follow [this
guide](Installation-Windows.md) to set up your Python environment.
[Download](https://www.python.org/downloads/) and install Python 3 if you do not already have it.
[Download](https://www.python.org/downloads/) and install Python 3 if you do not
already have it.
If your Python environment doesn't include `pip`, see these
If your Python environment doesn't include `pip`, see these
To install dependencies, **go into the `python` subdirectory** of the repository,
and run from the command line:
To install dependencies, enter the `ml-agents/` directory and run from
the command line:
pip3 install .
pip install .
If you'd like to use Docker for ML-Agents, please follow
[this guide](Using-Docker.md).
If you'd like to use Docker for ML-Agents, please follow
[this guide](Using-Docker.md).
The [Basic Guide](Basic-Guide.md) page contains several short
tutorials on setting up the ML-Agents toolkit within Unity, running a pre-trained model, in
The [Basic Guide](Basic-Guide.md) page contains several short tutorials on
setting up the ML-Agents toolkit within Unity, running a pre-trained model, in
If you run into any problems regarding ML-Agents, refer to our [FAQ](FAQ.md) and our [Limitations](Limitations.md) pages. If you can't find anything please
If you run into any problems regarding ML-Agents, refer to our [FAQ](FAQ.md) and
our [Limitations](Limitations.md) pages. If you can't find anything please
make sure to cite relevant information on OS, Python version, and exact error
message (whenever possible).
make sure to cite relevant information on OS, Python version, and exact error
message (whenever possible).

12
docs/Learning-Environment-Create-New.md


2. In a file system window, navigate to the folder containing your cloned ML-Agents repository.
3. Drag the `ML-Agents` folder from `unity-environments/Assets` to the Unity Editor Project window.
3. Drag the `ML-Agents` folder from `MLAgentsSDK/Assets` to the Unity Editor Project window.
Your Unity **Project** window should contain the following assets:

Press **Play** to run the scene and use the WASD keys to move the agent around the platform. Make sure that there are no errors displayed in the Unity editor Console window and that the agent resets when it reaches its target or falls from the platform. Note that for more involved debugging, the ML-Agents SDK includes a convenient Monitor class that you can use to easily display agent status information in the Game window.
One additional test you can perform is to first ensure that your environment
and the Python API work as expected using the `python/Basics`
[Jupyter notebook](Background-Jupyter.md). Within `Basics`, be sure to set
`env_name` to the name of the environment file you specify when building
this environment.
One additional test you can perform is to first ensure that your environment and
the Python API work as expected using the `notebooks/getting-started.ipynb`
[Jupyter notebook](Background-Jupyter.md). Within the notebook, be sure to set
`env_name` to the name of the environment file you specify when building this
environment.
Now you can train the Agent. To get ready for training, you must first to change the **Brain Type** from **Player** to **External**. From there, the process is the same as described in [Training ML-Agents](Training-ML-Agents.md).

2
docs/Learning-Environment-Design-External-Internal-Brains.md


4. Once the `environment.bytes` file is imported, drag it from the **Project** window to the **Graph Model** field of the Brain component.
If you are using a model produced by the ML-Agents `learn.py` program, use the default values for the other Internal Brain parameters.
If you are using a model produced by the ML-Agents `mlagents-learn` command, use the default values for the other Internal Brain parameters.
### Internal Brain properties

2
docs/Learning-Environment-Examples.md


The Unity ML-Agents toolkit contains an expanding set of example environments which
demonstrate various features of the platform. Environments are located in
`unity-environment/Assets/ML-Agents/Examples` and summarized below.
`MLAgentsSDK/Assets/ML-Agents/Examples` and summarized below.
Additionally, our
[first ML Challenge](https://connect.unity.com/challenges/ml-agents-1)
contains environments created by the community.

28
docs/Learning-Environment-Executable.md


1. Launch Unity.
2. On the Projects dialog, choose the **Open** option at the top of the window.
3. Using the file dialog that opens, locate the `unity-environment` folder
3. Using the file dialog that opens, locate the `MLAgentsSDK` folder
within the ML-Agents project and click **Open**.
4. In the **Project** window, navigate to the folder
`Assets/ML-Agents/Examples/3DBall/`.

the 3DBall Scene is the only one checked. (If the list is empty, than only the
current scene is included in the build).
6. Click **Build**:
- In the File dialog, navigate to the `python` folder in your ML-Agents
directory.
- In the File dialog, navigate to your ML-Agents directory.
- (For Windows)With Unity 2018.1, it will ask you to select a folder instead of a file name. Create a subfolder within `python` folder and select that folder to build. In the following steps you will refer to this subfolder's name as `env_name`.
- (For Windows)With Unity 2018.1, it will ask you to select a folder instead of a file name. Create a subfolder within the ML-Agents folder and select that folder to build. In the following steps you will refer to this subfolder's name as `env_name`.
![Build Window](images/mlagents-BuildWindow.png)

If you want to use the [Python API](Python-API.md) to interact with your executable, you can pass the name of the executable with the argument 'file_name' of the `UnityEnvironment`. For instance :
```python
from unityagents import UnityEnvironment
from mlagents.envs import UnityEnvironment
env = UnityEnvironment(file_name=<env_name>)
```

3. Change to the python directory.
4. Run `python3 learn.py <env_name> --run-id=<run-identifier> --train`
4. Run `mlagents-learn <trainer-config-file> --env=<env_name> --run-id=<run-identifier> --train`
- `<env_name>` is the name and path to the executable you exported from Unity (without extension)
- `<run-identifier>` is a string used to separate the results of different training runs
- And the `--train` tells learn.py to run a training session (rather than inference)
- `<trainer-config-file>` is the filepath of the trainer configuration yaml.
- `<env_name>` is the name and path to the executable you exported from Unity (without extension)
- `<run-identifier>` is a string used to separate the results of different training runs
- And the `--train` tells `mlagents-learn` to run a training session (rather than inference)
```
python3 learn.py 3DBall --run-id=firstRun --train
```shell
mlagents-learn config/trainer_config.yaml --env=3DBall --run-id=firstRun --train
```
![Training command example](images/training-command-example.png)

If the learn.py runs correctly and starts training, you should see something like this:
If `mlagents-learn` runs correctly and starts training, you should see something like this:
You can press Ctrl+C to stop the training, and your trained model will be at `ml-agents/python/models/<run-identifier>/<env_name>_<run-identifier>.bytes`, which corresponds to your model's latest checkpoint. You can now embed this trained model into your internal brain by following the steps below:
You can press Ctrl+C to stop the training, and your trained model will be at `models/<run-identifier>/<env_name>_<run-identifier>.bytes`, which corresponds to your model's latest checkpoint. You can now embed this trained model into your internal brain by following the steps below:
`unity-environment/Assets/ML-Agents/Examples/3DBall/TFModels/`.
`MLAgentsSDK/Assets/ML-Agents/Examples/3DBall/TFModels/`.
2. Open the Unity Editor, and select the **3DBall** scene as described above.
3. Select the **Ball3DBrain** object from the Scene hierarchy.
4. Change the **Type of Brain** to **Internal**.

685
docs/ML-Agents-Overview.md


# ML-Agents Toolkit Overview
**The Unity Machine Learning Agents Toolkit** (ML-Agents Toolkit) is an open-source Unity plugin
that enables games and simulations to serve as environments for training
intelligent agents. Agents can be trained using reinforcement learning,
imitation learning, neuroevolution, or other machine learning methods through
a simple-to-use Python API. We also provide implementations (based on
TensorFlow) of state-of-the-art algorithms to enable game developers
and hobbyists to easily train intelligent agents for 2D, 3D and VR/AR games.
These trained agents can be used for multiple purposes, including
**The Unity Machine Learning Agents Toolkit** (ML-Agents Toolkit) is an
open-source Unity plugin that enables games and simulations to serve as
environments for training intelligent agents. Agents can be trained using
reinforcement learning, imitation learning, neuroevolution, or other machine
learning methods through a simple-to-use Python API. We also provide
implementations (based on TensorFlow) of state-of-the-art algorithms to enable
game developers and hobbyists to easily train intelligent agents for 2D, 3D and
VR/AR games. These trained agents can be used for multiple purposes, including
design decisions pre-release. The ML-Agents toolkit is mutually beneficial for both game
developers and AI researchers as it provides a central platform where advances
in AI can be evaluated on Unity’s rich environments and then made accessible
to the wider research and game developer communities.
design decisions pre-release. The ML-Agents toolkit is mutually beneficial for
both game developers and AI researchers as it provides a central platform where
advances in AI can be evaluated on Unity’s rich environments and then made
accessible to the wider research and game developer communities.
Depending on your background (i.e. researcher, game developer, hobbyist),
you may have very different questions on your mind at the moment.
To make your transition to the ML-Agents toolkit easier, we provide several background
pages that include overviews and helpful resources on the
[Unity Engine](Background-Unity.md),
[machine learning](Background-Machine-Learning.md) and
[TensorFlow](Background-TensorFlow.md). We **strongly** recommend browsing
the relevant background pages if you're not familiar with a Unity scene,
basic machine learning concepts or have not previously heard of TensorFlow.
Depending on your background (i.e. researcher, game developer, hobbyist), you
may have very different questions on your mind at the moment. To make your
transition to the ML-Agents toolkit easier, we provide several background pages
that include overviews and helpful resources on the [Unity
Engine](Background-Unity.md), [machine learning](Background-Machine-Learning.md)
and [TensorFlow](Background-TensorFlow.md). We **strongly** recommend browsing
the relevant background pages if you're not familiar with a Unity scene, basic
machine learning concepts or have not previously heard of TensorFlow.
components, different training modes and scenarios. By the end of it, you
should have a good sense of _what_ the ML-Agents toolkit allows you to do. The subsequent
documentation pages provide examples of _how_ to use ML-Agents.
components, different training modes and scenarios. By the end of it, you should
have a good sense of _what_ the ML-Agents toolkit allows you to do. The
subsequent documentation pages provide examples of _how_ to use ML-Agents.
hypothetical, running example throughout. We will explore the
problem of training the behavior of a non-playable character (NPC) in a game.
(An NPC is a game character that is never controlled by a human player and
its behavior is pre-defined by the game developer.) More specifically, let's
assume we're building a multi-player, war-themed game in which players control
the soldiers. In this game, we have a single NPC who serves as a medic, finding
and reviving wounded players. Lastly, let us assume that there
are two teams, each with five players and one NPC medic.
hypothetical, running example throughout. We will explore the problem of
training the behavior of a non-playable character (NPC) in a game. (An NPC is a
game character that is never controlled by a human player and its behavior is
pre-defined by the game developer.) More specifically, let's assume we're
building a multi-player, war-themed game in which players control the soldiers.
In this game, we have a single NPC who serves as a medic, finding and reviving
wounded players. Lastly, let us assume that there are two teams, each with five
players and one NPC medic.
location. Second, it needs to be aware of which of its team members are
injured and require assistance. In the case of multiple injuries, it needs to
assess the degree of injury and decide who to help first. Lastly, a good
medic will always place itself in a position where it can quickly help its
team members. Factoring in all of these traits means that at every instance,
the medic needs to measure several attributes of the environment (e.g.
position of team members, position of enemies, which of its team members are
injured and to what degree) and then decide on an action (e.g. hide from enemy
fire, move to help one of its members). Given the large number of settings of
the environment and the large number of actions that the medic can take,
defining and implementing such complex behaviors by hand is challenging and
prone to errors.
location. Second, it needs to be aware of which of its team members are injured
and require assistance. In the case of multiple injuries, it needs to assess the
degree of injury and decide who to help first. Lastly, a good medic will always
place itself in a position where it can quickly help its team members. Factoring
in all of these traits means that at every instance, the medic needs to measure
several attributes of the environment (e.g. position of team members, position
of enemies, which of its team members are injured and to what degree) and then
decide on an action (e.g. hide from enemy fire, move to help one of its
members). Given the large number of settings of the environment and the large
number of actions that the medic can take, defining and implementing such
complex behaviors by hand is challenging and prone to errors.
With ML-Agents, it is possible to _train_ the behaviors of such NPCs
(called **agents**) using a variety of methods. The basic idea is quite simple.
We need to define three entities at every moment of the game
(called **environment**):
With ML-Agents, it is possible to _train_ the behaviors of such NPCs (called
**agents**) using a variety of methods. The basic idea is quite simple. We need
to define three entities at every moment of the game (called **environment**):
Observations can be numeric and/or visual. Numeric observations measure
attributes of the environment from the point of view of the agent. For
our medic this would be attributes of the battlefield that are visible to it.
For most interesting environments, an agent will require
several continuous numeric observations.
Visual observations, on the other hand, are images generated from the cameras
attached to the agent and represent what the agent is seeing at that point
in time. It is common to confuse an agent's observation with the environment
(or game) **state**. The environment state represents information about the
entire scene containing all the game characters. The agents observation,
however, only contains information that the agent is aware of and is typically
a subset of the environment state. For example, the medic observation cannot
include information about an enemy in hiding that the medic is unaware of.
- **Actions** - what actions the medic can take. Similar
to observations, actions can either be continuous or discrete depending
on the complexity of the environment and agent. In the case of the medic,
if the environment is a simple grid world where only their location matters,
then a discrete action taking on one of four values (north, south, east, west)
suffices. However, if the environment is more complex and the medic can move
freely then using two continuous actions (one for direction and another
for speed) is more appropriate.
Observations can be numeric and/or visual. Numeric observations measure
attributes of the environment from the point of view of the agent. For our
medic this would be attributes of the battlefield that are visible to it. For
most interesting environments, an agent will require several continuous
numeric observations. Visual observations, on the other hand, are images
generated from the cameras attached to the agent and represent what the agent
is seeing at that point in time. It is common to confuse an agent's
observation with the environment (or game) **state**. The environment state
represents information about the entire scene containing all the game
characters. The agents observation, however, only contains information that
the agent is aware of and is typically a subset of the environment state. For
example, the medic observation cannot include information about an enemy in
hiding that the medic is unaware of.
- **Actions** - what actions the medic can take. Similar to observations,
actions can either be continuous or discrete depending on the complexity of
the environment and agent. In the case of the medic, if the environment is a
simple grid world where only their location matters, then a discrete action
taking on one of four values (north, south, east, west) suffices. However, if
the environment is more complex and the medic can move freely then using two
continuous actions (one for direction and another for speed) is more
appropriate.
Note that the reward signal need not be
provided at every moment, but only when the medic performs an action that is
good or bad. For example, it can receive a large negative reward if it dies,
a modest positive reward whenever it revives a wounded team member, and a
modest negative reward when a wounded team member dies due to lack of
assistance. Note that the reward signal is how the objectives of the task
are communicated to the agent, so they need to be set up in a manner where
maximizing reward generates the desired optimal behavior.
Note that the reward signal need not be provided at every moment, but only
when the medic performs an action that is good or bad. For example, it can
receive a large negative reward if it dies, a modest positive reward whenever
it revives a wounded team member, and a modest negative reward when a wounded
team member dies due to lack of assistance. Note that the reward signal is how
the objectives of the task are communicated to the agent, so they need to be
set up in a manner where maximizing reward generates the desired optimal
behavior.
After defining these three entities (the building blocks of a
**reinforcement learning task**),
we can now _train_ the medic's behavior. This is achieved by simulating the
environment for many trials where the medic, over time, learns what is the
optimal action to take for every observation it measures by maximizing
its future reward. The key is that by learning the actions that maximize its
reward, the medic is learning the behaviors that make it a good medic (i.e.
one who saves the most number of lives). In **reinforcement learning**
terminology, the behavior that is learned is called a **policy**, which is
essentially a (optimal) mapping from observations to actions. Note that
After defining these three entities (the building blocks of a **reinforcement
learning task**), we can now _train_ the medic's behavior. This is achieved by
simulating the environment for many trials where the medic, over time, learns
what is the optimal action to take for every observation it measures by
maximizing its future reward. The key is that by learning the actions that
maximize its reward, the medic is learning the behaviors that make it a good
medic (i.e. one who saves the most number of lives). In **reinforcement
learning** terminology, the behavior that is learned is called a **policy**,
which is essentially a (optimal) mapping from observations to actions. Note that
**training phase**, while playing the game with an NPC that is using its
learned policy is called the **inference phase**.
**training phase**, while playing the game with an NPC that is using its learned
policy is called the **inference phase**.
The ML-Agents toolkit provides all the necessary tools for using Unity as the simulation
engine for learning the policies of different objects in a Unity environment.
In the next few sections, we discuss how the ML-Agents toolkit achieves this and what
features it provides.
The ML-Agents toolkit provides all the necessary tools for using Unity as the
simulation engine for learning the policies of different objects in a Unity
environment. In the next few sections, we discuss how the ML-Agents toolkit
achieves this and what features it provides.
The ML-Agents toolkit is a Unity plugin that contains three high-level components:
* **Learning Environment** - which contains the Unity scene and all the game
characters.
* **Python API** - which contains all the machine learning algorithms that are
used for training (learning a behavior or policy). Note that, unlike
the Learning Environment, the Python API is not part of Unity, but lives
outside and communicates with Unity through the External Communicator.
* **External Communicator** - which connects the Learning Environment
with the Python API. It lives within the Learning Environment.
The ML-Agents toolkit is a Unity plugin that contains three high-level
components:
- **Learning Environment** - which contains the Unity scene and all the game
characters.
- **Python API** - which contains all the machine learning algorithms that are
used for training (learning a behavior or policy). Note that, unlike the
Learning Environment, the Python API is not part of Unity, but lives outside
and communicates with Unity through the External Communicator.
- **External Communicator** - which connects the Learning Environment with the
Python API. It lives within the Learning Environment.
<p align="center">
<img src="images/learning_environment_basic.png"

_Simplified block diagram of ML-Agents._
The Learning Environment contains three additional components that help
organize the Unity scene:
* **Agents** - which is attached to a Unity GameObject (any character within a
scene) and handles generating its observations, performing the actions it
receives and assigning a reward (positive / negative) when appropriate.
Each Agent is linked to exactly one Brain.
* **Brains** - which encapsulates the logic for making decisions for the Agent.
In essence, the Brain is what holds on to the policy for each Agent and
determines which actions the Agent should take at each instance. More
specifically, it is the component that receives the observations and rewards
from the Agent and returns an action.
* **Academy** - which orchestrates the observation and decision making process.
Within the Academy, several environment-wide parameters such as the rendering
quality and the speed at which the environment is run can be specified. The
External Communicator lives within the Academy.
organize the Unity scene:
Every Learning Environment will always have one global Academy and one Agent
for every character in the scene. While each Agent must be linked to a Brain,
it is possible for Agents that have similar observations and actions to be
linked to the same Brain. In our sample game, we have two teams each with
their own medic. Thus we will have two Agents in our Learning Environment,
one for each medic, but both of these medics can be linked to the same Brain.
Note that these two medics are linked to the same Brain because their _space_
of observations and actions are similar. This does not mean that at each
instance they will have identical observation and action _values_. In other
words, the Brain defines the space of all possible observations and actions,
while the Agents connected to it (in this case the medics) can each have
their own, unique observation and action values. If we expanded our game
to include tank driver NPCs, then the Agent attached to those characters
cannot share a Brain with the Agent linked to the medics (medics and drivers
have different actions).
- **Agents** - which is attached to a Unity GameObject (any character within a
scene) and handles generating its observations, performing the actions it
receives and assigning a reward (positive / negative) when appropriate. Each
Agent is linked to exactly one Brain.
- **Brains** - which encapsulates the logic for making decisions for the Agent.
In essence, the Brain is what holds on to the policy for each Agent and
determines which actions the Agent should take at each instance. More
specifically, it is the component that receives the observations and rewards
from the Agent and returns an action.
- **Academy** - which orchestrates the observation and decision making process.
Within the Academy, several environment-wide parameters such as the rendering
quality and the speed at which the environment is run can be specified. The
External Communicator lives within the Academy.
Every Learning Environment will always have one global Academy and one Agent for
every character in the scene. While each Agent must be linked to a Brain, it is
possible for Agents that have similar observations and actions to be linked to
the same Brain. In our sample game, we have two teams each with their own medic.
Thus we will have two Agents in our Learning Environment, one for each medic,
but both of these medics can be linked to the same Brain. Note that these two
medics are linked to the same Brain because their _space_ of observations and
actions are similar. This does not mean that at each instance they will have
identical observation and action _values_. In other words, the Brain defines the
space of all possible observations and actions, while the Agents connected to it
(in this case the medics) can each have their own, unique observation and action
values. If we expanded our game to include tank driver NPCs, then the Agent
attached to those characters cannot share a Brain with the Agent linked to the
medics (medics and drivers have different actions).
<img src="images/learning_environment_example.png"
alt="Example ML-Agents Scene Block Diagram"
<img src="images/learning_environment_example.png"
alt="Example ML-Agents Scene Block Diagram"
We have yet to discuss how the ML-Agents toolkit trains behaviors, and what role the
Python API and External Communicator play. Before we dive into those details,
let's summarize the earlier components. Each character is attached to an Agent,
and each Agent is linked to a Brain. The Brain receives observations and
rewards from the Agent and returns actions. The Academy ensures that all the
We have yet to discuss how the ML-Agents toolkit trains behaviors, and what role
the Python API and External Communicator play. Before we dive into those
details, let's summarize the earlier components. Each character is attached to
an Agent, and each Agent is linked to a Brain. The Brain receives observations
and rewards from the Agent and returns actions. The Academy ensures that all the
* **External** - where decisions are made using the Python API. Here, the
observations and rewards collected by the Brain are forwarded to the Python
API through the External Communicator. The Python API then returns the
corresponding action that needs to be taken by the Agent.
* **Internal** - where decisions are made using an embedded
[TensorFlow](Background-TensorFlow.md) model.
The embedded TensorFlow model represents a learned policy and the Brain
directly uses this model to determine the action for each Agent.
* **Player** - where decisions are made using real input from a keyboard or
controller. Here, a human player is controlling the Agent and the observations
and rewards collected by the Brain are not used to control the Agent.
* **Heuristic** - where decisions are made using hard-coded behavior. This
resembles how most character behaviors are currently defined and can be
helpful for debugging or comparing how an Agent with hard-coded rules compares
to an Agent whose behavior has been trained. In our example, once we have
trained a Brain for the medics we could assign a medic on one team to the
trained Brain and assign the medic on the other team a Heuristic Brain
with hard-coded behaviors. We can then evaluate which medic is more effective.
- **External** - where decisions are made using the Python API. Here, the
observations and rewards collected by the Brain are forwarded to the Python
API through the External Communicator. The Python API then returns the
corresponding action that needs to be taken by the Agent.
- **Internal** - where decisions are made using an embedded
[TensorFlow](Background-TensorFlow.md) model. The embedded TensorFlow model
represents a learned policy and the Brain directly uses this model to
determine the action for each Agent.
- **Player** - where decisions are made using real input from a keyboard or
controller. Here, a human player is controlling the Agent and the observations
and rewards collected by the Brain are not used to control the Agent.
- **Heuristic** - where decisions are made using hard-coded behavior. This
resembles how most character behaviors are currently defined and can be
helpful for debugging or comparing how an Agent with hard-coded rules compares
to an Agent whose behavior has been trained. In our example, once we have
trained a Brain for the medics we could assign a medic on one team to the
trained Brain and assign the medic on the other team a Heuristic Brain with
hard-coded behaviors. We can then evaluate which medic is more effective.
As currently described, it may seem that the External Communicator
and Python API are only leveraged by the External Brain. This is not true.
It is possible to configure the Internal, Player and Heuristic Brains to
also send the observations, rewards and actions to the Python API through
the External Communicator (a feature called _broadcasting_). As we will see
shortly, this enables additional training modes.
As currently described, it may seem that the External Communicator and Python
API are only leveraged by the External Brain. This is not true. It is possible
to configure the Internal, Player and Heuristic Brains to also send the
observations, rewards and actions to the Python API through the External
Communicator (a feature called _broadcasting_). As we will see shortly, this
enables additional training modes.
<img src="images/learning_environment.png"
alt="ML-Agents Scene Block Diagram"
<img src="images/learning_environment.png"
alt="ML-Agents Scene Block Diagram"
border="10" />
</p>

### Built-in Training and Inference
As mentioned previously, the ML-Agents toolkit ships with several implementations of
state-of-the-art algorithms for training intelligent agents. In this mode, the
Brain type is set to External during training and Internal during inference.
More specifically, during training, all the medics in the scene send their
observations to the Python API through the External Communicator (this is the
behavior with an External Brain). The Python API processes these observations
and sends back actions for each medic to take. During training these actions
are mostly exploratory to help the Python API learn the best policy for each
medic. Once training concludes, the learned policy for each medic can be
exported. Given that all our implementations are based on TensorFlow, the
learned policy is just a TensorFlow model file. Then during the inference
phase, we switch the Brain type to Internal and include the TensorFlow model
generated from the training phase. Now during the inference phase, the medics
still continue to generate their observations, but instead of being sent to
the Python API, they will be fed into their (internal, embedded) model to
generate the _optimal_ action for each medic to take at every point in time.
As mentioned previously, the ML-Agents toolkit ships with several
implementations of state-of-the-art algorithms for training intelligent agents.
In this mode, the Brain type is set to External during training and Internal
during inference. More specifically, during training, all the medics in the
scene send their observations to the Python API through the External
Communicator (this is the behavior with an External Brain). The Python API
processes these observations and sends back actions for each medic to take.
During training these actions are mostly exploratory to help the Python API
learn the best policy for each medic. Once training concludes, the learned
policy for each medic can be exported. Given that all our implementations are
based on TensorFlow, the learned policy is just a TensorFlow model file. Then
during the inference phase, we switch the Brain type to Internal and include the
TensorFlow model generated from the training phase. Now during the inference
phase, the medics still continue to generate their observations, but instead of
being sent to the Python API, they will be fed into their (internal, embedded)
model to generate the _optimal_ action for each medic to take at every point in
time.
To summarize: our built-in implementations are based on TensorFlow, thus,
during training the Python API uses the observations it receives to learn
a TensorFlow model. This model is then embedded within the Internal Brain
during inference to generate the optimal actions for all Agents linked to
that Brain. **Note that our Internal Brain is currently experimental as it
is limited to TensorFlow models and leverages the third-party
[TensorFlowSharp](https://github.com/migueldeicaza/TensorFlowSharp)
library.**
To summarize: our built-in implementations are based on TensorFlow, thus, during
training the Python API uses the observations it receives to learn a TensorFlow
model. This model is then embedded within the Internal Brain during inference to
generate the optimal actions for all Agents linked to that Brain. **Note that
our Internal Brain is currently experimental as it is limited to TensorFlow
models and leverages the third-party
[TensorFlowSharp](https://github.com/migueldeicaza/TensorFlowSharp) library.**
The
[Getting Started with the 3D Balance Ball Example](Getting-Started-with-Balance-Ball.md)

In the previous mode, the External Brain type was used for training
to generate a TensorFlow model that the Internal Brain type can understand
and use. However, any user of the ML-Agents toolkit can leverage their own algorithms
for both training and inference. In this case, the Brain type would be set
to External for both training and inferences phases and the behaviors of
all the Agents in the scene will be controlled within Python.
In the previous mode, the External Brain type was used for training to generate
a TensorFlow model that the Internal Brain type can understand and use. However,
any user of the ML-Agents toolkit can leverage their own algorithms for both
training and inference. In this case, the Brain type would be set to External
for both training and inferences phases and the behaviors of all the Agents in
the scene will be controlled within Python.
We do not currently have a tutorial highlighting this mode, but you can
learn more about the Python API [here](Python-API.md).

This mode is an extension of _Built-in Training and Inference_, and
is particularly helpful when training intricate behaviors for complex
environments. Curriculum learning is a way of training a machine learning
model where more difficult aspects of a problem are gradually introduced in
such a way that the model is always optimally challenged. This idea has been
around for a long time, and it is how we humans typically learn. If you
imagine any childhood primary school education, there is an ordering of
classes and topics. Arithmetic is taught before algebra, for example.
Likewise, algebra is taught before calculus. The skills and knowledge learned
in the earlier subjects provide a scaffolding for later lessons. The same
principle can be applied to machine learning, where training on easier tasks
can provide a scaffolding for harder tasks in the future.
This mode is an extension of _Built-in Training and Inference_, and is
particularly helpful when training intricate behaviors for complex environments.
Curriculum learning is a way of training a machine learning model where more
difficult aspects of a problem are gradually introduced in such a way that the
model is always optimally challenged. This idea has been around for a long time,
and it is how we humans typically learn. If you imagine any childhood primary
school education, there is an ordering of classes and topics. Arithmetic is
taught before algebra, for example. Likewise, algebra is taught before calculus.
The skills and knowledge learned in the earlier subjects provide a scaffolding
for later lessons. The same principle can be applied to machine learning, where
training on easier tasks can provide a scaffolding for harder tasks in the
future.
<img src="images/math.png"
alt="Example Math Curriculum"
<img src="images/math.png"
alt="Example Math Curriculum"
_Example of a mathematics curriculum. Lessons progress from simpler topics to more
complex ones, with each building on the last._
_Example of a mathematics curriculum. Lessons progress from simpler topics to
more complex ones, with each building on the last._
When we think about how reinforcement learning actually works, the
learning signal is reward received occasionally throughout training.
The starting point when training an agent to accomplish this task will be a
random policy. That starting policy will have the agent running in circles,
and will likely never, or very rarely achieve the reward for complex
environments. Thus by simplifying the environment at the beginning of training,
we allow the agent to quickly update the random policy to a more meaningful
one that is successively improved as the environment gradually increases in
complexity. In our example, we can imagine first training the medic when each
team only contains one player, and then iteratively increasing the number of
players (i.e. the environment complexity). The ML-Agents toolkit supports setting
custom environment parameters within the Academy. This allows
elements of the environment related to difficulty or complexity to be
dynamically adjusted based on training progress.
When we think about how reinforcement learning actually works, the learning
signal is reward received occasionally throughout training. The starting point
when training an agent to accomplish this task will be a random policy. That
starting policy will have the agent running in circles, and will likely never,
or very rarely achieve the reward for complex environments. Thus by simplifying
the environment at the beginning of training, we allow the agent to quickly
update the random policy to a more meaningful one that is successively improved
as the environment gradually increases in complexity. In our example, we can
imagine first training the medic when each team only contains one player, and
then iteratively increasing the number of players (i.e. the environment
complexity). The ML-Agents toolkit supports setting custom environment
parameters within the Academy. This allows elements of the environment related
to difficulty or complexity to be dynamically adjusted based on training
progress.
The [Training with Curriculum Learning](Training-Curriculum-Learning.md)
tutorial covers this training mode with the **Wall Area** sample environment.

It is often more intuitive to simply demonstrate the behavior we
want an agent to perform, rather than attempting to have it learn via
trial-and-error methods. For example, instead of training the medic by
setting up its reward function, this mode allows providing real examples from
a game controller on how the medic should behave. More specifically,
in this mode, the Brain type during training is set to Player and all the
actions performed with the controller (in addition to the agent observations)
will be recorded and sent to the Python API. The imitation learning algorithm
will then use these pairs of observations and actions from the human player
to learn a policy. [Video Link](https://youtu.be/kpb8ZkMBFYs).
It is often more intuitive to simply demonstrate the behavior we want an agent
to perform, rather than attempting to have it learn via trial-and-error methods.
For example, instead of training the medic by setting up its reward function,
this mode allows providing real examples from a game controller on how the medic
should behave. More specifically, in this mode, the Brain type during training
is set to Player and all the actions performed with the controller (in addition
to the agent observations) will be recorded and sent to the Python API. The
imitation learning algorithm will then use these pairs of observations and
actions from the human player to learn a policy. [Video
Link](https://youtu.be/kpb8ZkMBFYs).
The [Training with Imitation Learning](Training-Imitation-Learning.md) tutorial covers this
training mode with the **Banana Collector** sample environment.
The [Training with Imitation Learning](Training-Imitation-Learning.md) tutorial
covers this training mode with the **Banana Collector** sample environment.
While the discussion so-far has mostly focused on training a single agent, with
ML-Agents, several training scenarios are possible.
We are excited to see what kinds of novel and fun environments the community
creates. For those new to training intelligent agents, below are a few examples
that can serve as inspiration:
* Single-Agent. A single Agent linked to a single Brain, with its own reward
signal. The traditional way of training an agent. An example is any
single-player game, such as Chicken.
[Video Link](https://www.youtube.com/watch?v=fiQsmdwEGT8&feature=youtu.be).
* Simultaneous Single-Agent. Multiple independent Agents with independent
reward signals linked to a single Brain. A parallelized version of the
traditional training scenario, which can speed-up and stabilize the training
process. Helpful when you have multiple versions of the same character in an
environment who should learn similar behaviors. An example might be training
a dozen robot-arms to each open a door simultaneously.
[Video Link](https://www.youtube.com/watch?v=fq0JBaiCYNA).
* Adversarial Self-Play. Two interacting Agents with inverse reward signals
linked to a single Brain. In two-player games, adversarial self-play can allow
an agent to become increasingly more skilled, while always having the perfectly
matched opponent: itself. This was the strategy employed when training AlphaGo,
and more recently used by OpenAI to train a human-beating 1-vs-1 Dota 2 agent.
* Cooperative Multi-Agent. Multiple interacting Agents with a shared reward
signal linked to either a single or multiple different Brains. In this
scenario, all agents must work together to accomplish a task that cannot be
done alone. Examples include environments where each agent only has access to
partial information, which needs to be shared in order to accomplish the task
or collaboratively solve a puzzle.
* Competitive Multi-Agent. Multiple interacting Agents with inverse reward
signals linked to either a single or multiple different Brains. In this
scenario, agents must compete with one another to either win a competition,
or obtain some limited set of resources. All team sports fall into this
scenario.
* Ecosystem. Multiple interacting Agents with independent reward signals
linked to either a single or multiple different Brains. This scenario can be
thought of as creating a small world in which animals with different goals all
interact, such as a savanna in which there might be zebras, elephants and
giraffes, or an autonomous driving simulation within an urban environment.
While the discussion so-far has mostly focused on training a single agent, with
ML-Agents, several training scenarios are possible. We are excited to see what
kinds of novel and fun environments the community creates. For those new to
training intelligent agents, below are a few examples that can serve as
inspiration:
- Single-Agent. A single Agent linked to a single Brain, with its own reward
signal. The traditional way of training an agent. An example is any
single-player game, such as Chicken. [Video
Link](https://www.youtube.com/watch?v=fiQsmdwEGT8&feature=youtu.be).
- Simultaneous Single-Agent. Multiple independent Agents with independent reward
signals linked to a single Brain. A parallelized version of the traditional
training scenario, which can speed-up and stabilize the training process.
Helpful when you have multiple versions of the same character in an
environment who should learn similar behaviors. An example might be training a
dozen robot-arms to each open a door simultaneously. [Video
Link](https://www.youtube.com/watch?v=fq0JBaiCYNA).
- Adversarial Self-Play. Two interacting Agents with inverse reward signals
linked to a single Brain. In two-player games, adversarial self-play can allow
an agent to become increasingly more skilled, while always having the
perfectly matched opponent: itself. This was the strategy employed when
training AlphaGo, and more recently used by OpenAI to train a human-beating
1-vs-1 Dota 2 agent.
- Cooperative Multi-Agent. Multiple interacting Agents with a shared reward
signal linked to either a single or multiple different Brains. In this
scenario, all agents must work together to accomplish a task that cannot be
done alone. Examples include environments where each agent only has access to
partial information, which needs to be shared in order to accomplish the task
or collaboratively solve a puzzle.
- Competitive Multi-Agent. Multiple interacting Agents with inverse reward
signals linked to either a single or multiple different Brains. In this
scenario, agents must compete with one another to either win a competition, or
obtain some limited set of resources. All team sports fall into this scenario.
- Ecosystem. Multiple interacting Agents with independent reward signals linked
to either a single or multiple different Brains. This scenario can be thought
of as creating a small world in which animals with different goals all
interact, such as a savanna in which there might be zebras, elephants and
giraffes, or an autonomous driving simulation within an urban environment.
Beyond the flexible training scenarios available, the ML-Agents toolkit includes
Beyond the flexible training scenarios available, the ML-Agents toolkit includes
* **On Demand Decision Making** - With the ML-Agents toolkit it is possible to have agents
request decisions only when needed as opposed to requesting decisions at
every step of the environment. This enables training of turn based games,