浏览代码

Updating docs with new paths.

/develop-generalizationTraining-TrainerController
Deric Pang 6 年前
当前提交
709141f4
共有 19 个文件被更改,包括 957 次插入808 次删除
  1. 25
      docs/API-Reference.md
  2. 15
      docs/Background-Jupyter.md
  3. 145
      docs/Basic-Guide.md
  4. 109
      docs/FAQ.md
  5. 381
      docs/Getting-Started-with-Balance-Ball.md
  6. 68
      docs/Glossary.md
  7. 52
      docs/Installation.md
  8. 12
      docs/Learning-Environment-Create-New.md
  9. 2
      docs/Learning-Environment-Examples.md
  10. 6
      docs/Learning-Environment-Executable.md
  11. 685
      docs/ML-Agents-Overview.md
  12. 135
      docs/Python-API.md
  13. 81
      docs/Readme.md
  14. 8
      docs/Training-Curriculum-Learning.md
  15. 2
      docs/Training-Imitation-Learning.md
  16. 6
      docs/Training-ML-Agents.md
  17. 8
      docs/dox-ml-agents.conf
  18. 16
      gym-unity/Readme.md
  19. 9
      ml-agents-protobuf/README.md

25
docs/API-Reference.md


# API Reference
Our developer-facing C# classes (Academy, Agent, Decision and
Monitor) have been documented to be compatabile with
[Doxygen](http://www.stack.nl/~dimitri/doxygen/) for auto-generating HTML
Our developer-facing C# classes (Academy, Agent, Decision and Monitor) have been
documented to be compatabile with
[Doxygen](http://www.stack.nl/~dimitri/doxygen/) for auto-generating HTML
To generate the API reference,
[download Doxygen](http://www.stack.nl/~dimitri/doxygen/download.html) and run
the following command within the `docs/` directory:
To generate the API reference, [download
Doxygen](http://www.stack.nl/~dimitri/doxygen/download.html) and run the
following command within the `docs/` directory:
that includes the classes that have been properly formatted.
The generated HTML files will be placed
in the `html/` subdirectory. Open `index.html` within that subdirectory to
navigate to the API reference home. Note that `html/` is already included in
the repository's `.gitignore` file.
that includes the classes that have been properly formatted. The generated HTML
files will be placed in the `html/` subdirectory. Open `index.html` within that
subdirectory to navigate to the API reference home. Note that `html/` is already
included in the repository's `.gitignore` file.
In the near future, we aim to expand our documentation
to include all the Unity C# classes and Python API.
In the near future, we aim to expand our documentation to include all the Unity
C# classes and Python API.

15
docs/Background-Jupyter.md


# Background: Jupyter
[Jupyter](https://jupyter.org) is a fantastic tool for writing code with
embedded visualizations. We provide one such notebook, `python/notebooks/getting-started.ipynb`,
for testing the Python control interface to a Unity build. This notebook is
introduced in the
[Getting Started with the 3D Balance Ball Environment](Getting-Started-with-Balance-Ball.md)
[Jupyter](https://jupyter.org) is a fantastic tool for writing code with
embedded visualizations. We provide one such notebook,
`notebooks/getting-started.ipynb`, for testing the Python control
interface to a Unity build. This notebook is introduced in the [Getting Started
with the 3D Balance Ball Environment](Getting-Started-with-Balance-Ball.md)
in the _Jupyter/IPython Quick Start Guide_. To launch Jupyter, run in the command line:
in the _Jupyter/IPython Quick Start Guide_. To launch Jupyter, run in the
command line:
`jupyter notebook`
jupyter notebook
Then navigate to `localhost:8888` to access your notebooks.

145
docs/Basic-Guide.md


# Basic Guide
This guide will show you how to use a pretrained model in an example Unity environment, and show you how to train the model yourself.
This guide will show you how to use a pretrained model in an example Unity
environment, and show you how to train the model yourself.
If you are not familiar with the [Unity Engine](https://unity3d.com/unity),
we highly recommend the [Roll-a-ball tutorial](https://unity3d.com/learn/tutorials/s/roll-ball-tutorial) to learn all the basic concepts of Unity.
If you are not familiar with the [Unity Engine](https://unity3d.com/unity), we
highly recommend the [Roll-a-ball
tutorial](https://unity3d.com/learn/tutorials/s/roll-ball-tutorial) to learn all
the basic concepts of Unity.
In order to use the ML-Agents toolkit within Unity, you need to change some Unity settings first. Also [TensorFlowSharp plugin](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage) is needed for you to use pretrained model within Unity, which is based on the [TensorFlowSharp repo](https://github.com/migueldeicaza/TensorFlowSharp).
In order to use the ML-Agents toolkit within Unity, you need to change some
Unity settings first. Also [TensorFlowSharp
plugin](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage)
is needed for you to use pretrained model within Unity, which is based on the
[TensorFlowSharp repo](https://github.com/migueldeicaza/TensorFlowSharp).
3. Using the file dialog that opens, locate the `unity-environment` folder within the the ML-Agents toolkit project and click **Open**.
3. Using the file dialog that opens, locate the `MLAgentsSDK` folder
within the the ML-Agents toolkit project and click **Open**.
5. For **each** of the platforms you target
(**PC, Mac and Linux Standalone**, **iOS** or **Android**):
5. For **each** of the platforms you target (**PC, Mac and Linux Standalone**,
**iOS** or **Android**):
2. Select **Scripting Runtime Version** to
**Experimental (.NET 4.6 Equivalent or .NET 4.x Equivalent)**
3. In **Scripting Defined Symbols**, add the flag `ENABLE_TENSORFLOW`.
After typing in the flag name, press Enter.
2. Select **Scripting Runtime Version** to **Experimental (.NET 4.6
Equivalent or .NET 4.x Equivalent)**
3. In **Scripting Defined Symbols**, add the flag `ENABLE_TENSORFLOW`. After
typing in the flag name, press Enter.
[Download](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage) the TensorFlowSharp plugin. Then import it into Unity by double clicking the downloaded file. You can check if it was successfully imported by checking the TensorFlow files in the Project window under **Assets** > **ML-Agents** > **Plugins** > **Computer**.
[Download](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage)
the TensorFlowSharp plugin. Then import it into Unity by double clicking the
downloaded file. You can check if it was successfully imported by checking the
TensorFlow files in the Project window under **Assets** > **ML-Agents** >
**Plugins** > **Computer**.
**Note**: If you don't see anything under **Assets**, drag the `ml-agents/unity-environment/Assets/ML-Agents` folder under **Assets** within Project window.
**Note**: If you don't see anything under **Assets**, drag the
`ml-agents/MLAgentsSDK/Assets/ML-Agents` folder under **Assets** within
Project window.
1. In the **Project** window, go to `Assets/ML-Agents/Examples/3DBall` folder and open the `3DBall` scene file.
2. In the **Hierarchy** window, select the **Ball3DBrain** child under the **Ball3DAcademy** GameObject to view its properties in the Inspector window.
3. On the **Ball3DBrain** object's **Brain** component, change the **Brain Type** to **Internal**.
4. In the **Project** window, locate the `Assets/ML-Agents/Examples/3DBall/TFModels` folder.
5. Drag the `3DBall` model file from the `TFModels` folder to the **Graph Model** field of the **Ball3DBrain** object's **Brain** component.
5. Click the **Play** button and you will see the platforms balance the balls using the pretrained model.
1. In the **Project** window, go to `Assets/ML-Agents/Examples/3DBall` folder
and open the `3DBall` scene file.
2. In the **Hierarchy** window, select the **Ball3DBrain** child under the
**Ball3DAcademy** GameObject to view its properties in the Inspector window.
3. On the **Ball3DBrain** object's **Brain** component, change the **Brain
Type** to **Internal**.
4. In the **Project** window, locate the
`Assets/ML-Agents/Examples/3DBall/TFModels` folder.
5. Drag the `3DBall` model file from the `TFModels` folder to the **Graph
Model** field of the **Ball3DBrain** object's **Brain** component.
6. Click the **Play** button and you will see the platforms balance the balls
using the pretrained model.
The `python/Basics` [Jupyter notebook](Background-Jupyter.md) contains a
simple walkthrough of the functionality of the Python
API. It can also serve as a simple test that your environment is configured
correctly. Within `Basics`, be sure to set `env_name` to the name of the
Unity executable if you want to [use an executable](Learning-Environment-Executable.md) or to `None` if you want to interact with the current scene in the Unity Editor.
The `notebooks/getting-started.ipynb` [Jupyter notebook](Background-Jupyter.md)
contains a simple walkthrough of the functionality of the Python API. It can
also serve as a simple test that your environment is configured correctly.
Within `Basics`, be sure to set `env_name` to the name of the Unity executable
if you want to [use an executable](Learning-Environment-Executable.md) or to
`None` if you want to interact with the current scene in the Unity Editor.
Since we are going to build this environment to conduct training, we need to
set the brain used by the agents to **External**. This allows the agents to
Since we are going to build this environment to conduct training, we need to set
the brain used by the agents to **External**. This allows the agents to
1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy
object.
1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy
object.
2. Select its child object **Ball3DBrain**.
3. In the Inspector window, set **Brain Type** to **External**.

1. Open a command or terminal window.
2. Nagivate to the folder where you installed the ML-Agents toolkit.
3. Change to the `python` directory.
4. Run `python3 learn.py --run-id=<run-identifier> --train`
Where:
- `<run-identifier>` is a string used to separate the results of different training runs
- And the `--train` tells learn.py to run a training session (rather than inference)
5. When the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen, you can press the :arrow_forward: button in Unity to start training in the Editor.
**Note**: Alternatively, you can use an executable rather than the Editor to perform training. Please refer to [this page](Learning-Environment-Executable.md) for instructions on how to build and use an executable.
1. Open a command or terminal window.
2. Nagivate to the folder where you installed the ML-Agents toolkit.
3. Run `learn <trainer-config-path> --run-id=<run-identifier> --train` Where:
- `<trainer-config-path>` is the relative or absolute filepath of the
trainer configuration. The defaults used by environments in the ML-Agents
SDK can be found in `trainer_config.yaml`.
- `<run-identifier>` is a string used to separate the results of different
training runs
- And the `--train` tells learn.py to run a training session (rather than
inference)
5. When the message _"Start training by pressing the Play button in the Unity
Editor"_ is displayed on the screen, you can press the :arrow_forward: button
in Unity to start training in the Editor.
**Note**: Alternatively, you can use an executable rather than the Editor to
perform training. Please refer to [this
page](Learning-Environment-Executable.md) for instructions on how to build and
use an executable.
**Note**: If you're using Anaconda, don't forget to activate the ml-agents environment first.
**Note**: If you're using Anaconda, don't forget to activate the ml-agents
environment first.
If the learn.py runs correctly and starts training, you should see something like this:
If the learn.py runs correctly and starts training, you should see something
like this:
You can press Ctrl+C to stop the training, and your trained model will be at `ml-agents/python/models/<run-identifier>/editor_<academy_name>_<run-identifier>.bytes` where `<academy_name>` is the name of the Academy GameObject in the current scene. This file corresponds to your model's latest checkpoint. You can now embed this trained model into your internal brain by following the steps below, which is similar to the steps described [above](#play-an-example-environment-using-pretrained-model).
1. Move your model file into
`unity-environment/Assets/ML-Agents/Examples/3DBall/TFModels/`.
You can press Ctrl+C to stop the training, and your trained model will be at
`models/<run-identifier>/editor_<academy_name>_<run-identifier>.bytes` where
`<academy_name>` is the name of the Academy GameObject in the current scene.
This file corresponds to your model's latest checkpoint. You can now embed this
trained model into your internal brain by following the steps below, which is
similar to the steps described
[above](#play-an-example-environment-using-pretrained-model).
1. Move your model file into
`MLAgentsSDK/Assets/ML-Agents/Examples/3DBall/TFModels/`.
5. Drag the `<env_name>_<run-identifier>.bytes` file from the Project window of the Editor
to the **Graph Model** placeholder in the **Ball3DBrain** inspector window.
5. Drag the `<env_name>_<run-identifier>.bytes` file from the Project window of
the Editor to the **Graph Model** placeholder in the **Ball3DBrain**
inspector window.
* For more information on the ML-Agents toolkit, in addition to helpful background, check out the [ML-Agents Toolkit Overview](ML-Agents-Overview.md) page.
* For a more detailed walk-through of our 3D Balance Ball environment, check out the [Getting Started](Getting-Started-with-Balance-Ball.md) page.
* For a "Hello World" introduction to creating your own learning environment, check out the [Making a New Learning Environment](Learning-Environment-Create-New.md) page.
* For a series of Youtube video tutorials, checkout the [Machine Learning Agents PlayList](https://www.youtube.com/playlist?list=PLX2vGYjWbI0R08eWQkO7nQkGiicHAX7IX) page.
- For more information on the ML-Agents toolkit, in addition to helpful
background, check out the [ML-Agents Toolkit Overview](ML-Agents-Overview.md)
page.
- For a more detailed walk-through of our 3D Balance Ball environment, check out
the [Getting Started](Getting-Started-with-Balance-Ball.md) page.
- For a "Hello World" introduction to creating your own learning environment,
check out the [Making a New Learning
Environment](Learning-Environment-Create-New.md) page.
- For a series of Youtube video tutorials, checkout the [Machine Learning Agents
PlayList](https://www.youtube.com/playlist?list=PLX2vGYjWbI0R08eWQkO7nQkGiicHAX7IX)
page.

109
docs/FAQ.md


# Frequently Asked Questions
### Scripting Runtime Environment not setup correctly
## Scripting Runtime Environment not setup correctly
If you haven't switched your scripting runtime version from .NET 3.5 to .NET 4.6 or .NET 4.x, you will see such error message:
If you haven't switched your scripting runtime version from .NET 3.5 to .NET 4.6
or .NET 4.x, you will see such error message:
This is because .NET 3.5 doesn't support method Clear() for StringBuilder, refer to [Setting Up The ML-Agents Toolkit Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
This is because .NET 3.5 doesn't support method Clear() for StringBuilder, refer
to [Setting Up The ML-Agents Toolkit Within
Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
### TensorFlowSharp flag not turned on.
## TensorFlowSharp flag not turned on
If you have already imported the TensorFlowSharp plugin, but havn't set ENABLE_TENSORFLOW flag for your scripting define symbols, you will see the following error message:
If you have already imported the TensorFlowSharp plugin, but havn't set
ENABLE_TENSORFLOW flag for your scripting define symbols, you will see the
following error message:
You need to install and enable the TensorFlowSharp plugin in order to use the internal brain.
You need to install and enable the TensorFlowSharp plugin in order to use the internal brain.
This error message occurs because the TensorFlowSharp plugin won't be usage without the ENABLE_TENSORFLOW flag, refer to [Setting Up The ML-Agents Toolkit Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
This error message occurs because the TensorFlowSharp plugin won't be usage
without the ENABLE_TENSORFLOW flag, refer to [Setting Up The ML-Agents Toolkit
Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
### Tensorflow epsilon placeholder error
## Tensorflow epsilon placeholder error
If you have a graph placeholder set in the internal Brain inspector that is not present in the TensorFlow graph, you will see some error like this:
If you have a graph placeholder set in the internal Brain inspector that is not
present in the TensorFlow graph, you will see some error like this:
UnityAgentsException: One of the Tensorflow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>.
UnityAgentsException: One of the Tensorflow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>.
Solution: Go to all of your Brain object, find `Graph placeholders` and change its `size` to 0 to remove the `epsilon` placeholder.
Solution: Go to all of your Brain object, find `Graph placeholders` and change
its `size` to 0 to remove the `epsilon` placeholder.
Similarly, if you have a graph scope set in the internal Brain inspector that is not correctly set, you will see some error like this:
Similarly, if you have a graph scope set in the internal Brain inspector that is
not correctly set, you will see some error like this:
Solution: Make sure your Graph Scope field matches the corresponding brain object name in your Hierachy Inspector when there is multiple brain.
Solution: Make sure your Graph Scope field matches the corresponding brain
object name in your Hierachy Inspector when there is multiple brain.
### Environment Permission Error
## Environment Permission Error
If you directly import your Unity environment without building it in the
editor, you might need to give it additional permissions to execute it.
If you directly import your Unity environment without building it in the
editor, you might need to give it additional permissions to execute it.
`chmod -R 755 *.app`
```shell
chmod -R 755 *.app
```
`chmod -R 755 *.x86_64`
```shell
chmod -R 755 *.x86_64
```
On Windows, you can find
On Windows, you can find
### Environment Connection Timeout
## Environment Connection Timeout
If you are able to launch the environment from `UnityEnvironment` but
then receive a timeout error, there may be a number of possible causes.
* _Cause_: There may be no Brains in your environment which are set
to `External`. In this case, the environment will not attempt to
communicate with python. _Solution_: Set the Brains(s) you wish to
externally control through the Python API to `External` from the
Unity Editor, and rebuild the environment.
* _Cause_: On OSX, the firewall may be preventing communication with
the environment. _Solution_: Add the built environment binary to the
list of exceptions on the firewall by following
[instructions](https://support.apple.com/en-us/HT201642).
* _Cause_: An error happened in the Unity Environment preventing
communication. _Solution_: Look into the
[log files](https://docs.unity3d.com/Manual/LogFiles.html)
generated by the Unity Environment to figure what error happened.
If you are able to launch the environment from `UnityEnvironment` but then
receive a timeout error, there may be a number of possible causes.
### Communication port {} still in use
* _Cause_: There may be no Brains in your environment which are set to
`External`. In this case, the environment will not attempt to communicate
with python. _Solution_: Set the Brains(s) you wish to externally control
through the Python API to `External` from the Unity Editor, and rebuild the
environment.
* _Cause_: On OSX, the firewall may be preventing communication with the
environment. _Solution_: Add the built environment binary to the list of
exceptions on the firewall by following
[instructions](https://support.apple.com/en-us/HT201642).
* _Cause_: An error happened in the Unity Environment preventing communication.
_Solution_: Look into the [log
files](https://docs.unity3d.com/Manual/LogFiles.html) generated by the Unity
Environment to figure what error happened.
If you receive an exception `"Couldn't launch new environment because
communication port {} is still in use. "`, you can change the worker
number in the Python script when calling
## Communication port {} still in use
If you receive an exception `"Couldn't launch new environment because
communication port {} is still in use. "`, you can change the worker number in
the Python script when calling
### Mean reward : nan
## Mean reward : nan
If you receive a message `Mean reward : nan` when attempting to train a
model using PPO, this is due to the episodes of the learning environment
not terminating. In order to address this, set `Max Steps` for either
the Academy or Agents within the Scene Inspector to a value greater
than 0. Alternatively, it is possible to manually set `done` conditions
for episodes from within scripts for custom episode-terminating events.
If you receive a message `Mean reward : nan` when attempting to train a model
using PPO, this is due to the episodes of the learning environment not
terminating. In order to address this, set `Max Steps` for either the Academy or
Agents within the Scene Inspector to a value greater than 0. Alternatively, it
is possible to manually set `done` conditions for episodes from within scripts
for custom episode-terminating events.

381
docs/Getting-Started-with-Balance-Ball.md


# Getting Started with the 3D Balance Ball Environment
This tutorial walks through the end-to-end process of opening a ML-Agents toolkit
example environment in Unity, building the Unity executable, training an agent
in it, and finally embedding the trained model into the Unity environment.
This tutorial walks through the end-to-end process of opening a ML-Agents
toolkit example environment in Unity, building the Unity executable, training an
agent in it, and finally embedding the trained model into the Unity environment.
The ML-Agents toolkit includes a number of [example environments](Learning-Environment-Examples.md)
which you can examine to help understand the different ways in which the ML-Agents toolkit
can be used. These environments can also serve as templates for new
environments or as ways to test new ML algorithms. After reading this tutorial,
you should be able to explore and build the example environments.
The ML-Agents toolkit includes a number of [example
environments](Learning-Environment-Examples.md) which you can examine to help
understand the different ways in which the ML-Agents toolkit can be used. These
environments can also serve as templates for new environments or as ways to test
new ML algorithms. After reading this tutorial, you should be able to explore
and build the example environments.
This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball contains
a number of platforms and balls (which are all copies of each other).
Each platform tries to keep its ball from falling by rotating either
horizontally or vertically. In this environment, a platform is an **agent**
that receives a reward for every step that it balances the ball. An agent is
also penalized with a negative reward for dropping the ball. The goal of the
training process is to have the platforms learn to never drop the ball.
This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball
contains a number of platforms and balls (which are all copies of each other).
Each platform tries to keep its ball from falling by rotating either
horizontally or vertically. In this environment, a platform is an **agent** that
receives a reward for every step that it balances the ball. An agent is also
penalized with a negative reward for dropping the ball. The goal of the training
process is to have the platforms learn to never drop the ball.
In order to install and set up the ML-Agents toolkit, the Python dependencies and Unity,
see the [installation instructions](Installation.md).
In order to install and set up the ML-Agents toolkit, the Python dependencies
and Unity, see the [installation instructions](Installation.md).
An agent is an autonomous actor that observes and interacts with an
_environment_. In the context of Unity, an environment is a scene containing
an Academy and one or more Brain and Agent objects, and, of course, the other
An agent is an autonomous actor that observes and interacts with an
_environment_. In the context of Unity, an environment is a scene containing an
Academy and one or more Brain and Agent objects, and, of course, the other
**Note:** In Unity, the base object of everything in a scene is the
_GameObject_. The GameObject is essentially a container for everything else,
including behaviors, graphics, physics, etc. To see the components that make
up a GameObject, select the GameObject in the Scene window, and open the
Inspector window. The Inspector shows every component on a GameObject.
The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several platforms. Each platform in the scene is an
independent agent, but they all share the same brain. 3D Balance Ball does this
**Note:** In Unity, the base object of everything in a scene is the
_GameObject_. The GameObject is essentially a container for everything else,
including behaviors, graphics, physics, etc. To see the components that make up
a GameObject, select the GameObject in the Scene window, and open the Inspector
window. The Inspector shows every component on a GameObject.
The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several platforms. Each platform in the scene is an
independent agent, but they all share the same brain. 3D Balance Ball does this
The Academy object for the scene is placed on the Ball3DAcademy GameObject.
When you look at an Academy component in the inspector, you can see several
properties that control how the environment works. For example, the
**Training** and **Inference Configuration** properties set the graphics and
timescale properties for the Unity application. The Academy uses the
**Training Configuration** during training and the **Inference Configuration**
when not training. (*Inference* means that the agent is using a trained model
or heuristics or direct control — in other words, whenever **not** training.)
Typically, you set low graphics quality and a high time scale for the
**Training configuration** and a high graphics quality and the timescale to
`1.0` for the **Inference Configuration** .
The Academy object for the scene is placed on the Ball3DAcademy GameObject. When
you look at an Academy component in the inspector, you can see several
properties that control how the environment works. For example, the **Training**
and **Inference Configuration** properties set the graphics and timescale
properties for the Unity application. The Academy uses the **Training
Configuration** during training and the **Inference Configuration** when not
training. (*Inference* means that the agent is using a trained model or
heuristics or direct control — in other words, whenever **not** training.)
Typically, you set low graphics quality and a high time scale for the **Training
configuration** and a high graphics quality and the timescale to `1.0` for the
**Inference Configuration** .
**Note:** if you want to observe the environment during training, you can
adjust the **Inference Configuration** settings to use a larger window and a
timescale closer to 1:1. Be sure to set these parameters back when training in
earnest; otherwise, training can take a very long time.
**Note:** if you want to observe the environment during training, you can adjust
the **Inference Configuration** settings to use a larger window and a timescale
closer to 1:1. Be sure to set these parameters back when training in earnest;
otherwise, training can take a very long time.
Another aspect of an environment to look at is the Academy implementation.
Since the base Academy class is abstract, you must always define a subclass.
There are three functions you can implement, though they are all optional:
Another aspect of an environment to look at is the Academy implementation. Since
the base Academy class is abstract, you must always define a subclass. There are
three functions you can implement, though they are all optional:
* Academy.AcademyStep() — Called at every simulation step before
Agent.AgentAction() (and after the agents collect their observations).
* Academy.AcademyReset() — Called when the Academy starts or restarts the
simulation (including the first time).
* Academy.AcademyStep() — Called at every simulation step before
Agent.AgentAction() (and after the agents collect their observations).
* Academy.AcademyReset() — Called when the Academy starts or restarts the
simulation (including the first time).
The 3D Balance Ball environment does not use these functions — each agent
resets itself when needed — but many environments do use these functions to
control the environment around the agents.
The 3D Balance Ball environment does not use these functions — each agent resets
itself when needed — but many environments do use these functions to control the
environment around the agents.
The Ball3DBrain GameObject in the scene, which contains a Brain component,
is a child of the Academy object. (All Brain objects in a scene must be
children of the Academy.) All the agents in the 3D Balance Ball environment
use the same Brain instance.
A Brain doesn't store any information about an agent,
it just routes the agent's collected observations to the decision making
process and returns the chosen action to the agent. Thus, all agents can share
the same brain, but act independently. The Brain settings tell you quite a bit
about how an agent works.
The Ball3DBrain GameObject in the scene, which contains a Brain component, is a
child of the Academy object. (All Brain objects in a scene must be children of
the Academy.) All the agents in the 3D Balance Ball environment use the same
Brain instance. A Brain doesn't store any information about an agent, it just
routes the agent's collected observations to the decision making process and
returns the chosen action to the agent. Thus, all agents can share the same
brain, but act independently. The Brain settings tell you quite a bit about how
an agent works.
The **Brain Type** determines how an agent makes its decisions. The
**External** and **Internal** types work together — use **External** when
training your agents; use **Internal** when using the trained model.
The **Heuristic** brain allows you to hand-code the agent's logic by extending
the Decision class. Finally, the **Player** brain lets you map keyboard
commands to actions, which can be useful when testing your agents and
environment. If none of these types of brains do what you need, you can
implement your own CoreBrain to create your own type.
The **Brain Type** determines how an agent makes its decisions. The **External**
and **Internal** types work together — use **External** when training your
agents; use **Internal** when using the trained model. The **Heuristic** brain
allows you to hand-code the agent's logic by extending the Decision class.
Finally, the **Player** brain lets you map keyboard commands to actions, which
can be useful when testing your agents and environment. If none of these types
of brains do what you need, you can implement your own CoreBrain to create your
own type.
In this tutorial, you will set the **Brain Type** to **External** for training;
In this tutorial, you will set the **Brain Type** to **External** for training;
**Vector Observation Space**
#### Vector Observation Space
Before making a decision, an agent collects its observation about its state
in the world. The vector observation is a vector of floating point numbers
which contain relevant information for the agent to make decisions.
Before making a decision, an agent collects its observation about its state in
the world. The vector observation is a vector of floating point numbers which
contain relevant information for the agent to make decisions.
The Brain instance used in the 3D Balance Ball example uses the **Continuous**
vector observation space with a **State Size** of 8. This means that the
feature vector containing the agent's observations contains eight elements:
the `x` and `z` components of the platform's rotation and the `x`, `y`, and `z`
components of the ball's relative position and velocity. (The observation
values are defined in the agent's `CollectObservations()` function.)
The Brain instance used in the 3D Balance Ball example uses the **Continuous**
vector observation space with a **State Size** of 8. This means that the feature
vector containing the agent's observations contains eight elements: the `x` and
`z` components of the platform's rotation and the `x`, `y`, and `z` components
of the ball's relative position and velocity. (The observation values are
defined in the agent's `CollectObservations()` function.)
**Vector Action Space**
#### Vector Action Space
An agent is given instructions from the brain in the form of *actions*.
ML-Agents toolkit classifies actions into two types: the **Continuous**
vector action space is a vector of numbers that can vary continuously. What
each element of the vector means is defined by the agent logic (the PPO
training process just learns what values are better given particular state
observations based on the rewards received when it tries different values).
For example, an element might represent a force or torque applied to a
`RigidBody` in the agent. The **Discrete** action vector space defines its
actions as tables. An action given to the agent is an array of indeces into
tables.
An agent is given instructions from the brain in the form of *actions*.
ML-Agents toolkit classifies actions into two types: the **Continuous** vector
action space is a vector of numbers that can vary continuously. What each
element of the vector means is defined by the agent logic (the PPO training
process just learns what values are better given particular state observations
based on the rewards received when it tries different values). For example, an
element might represent a force or torque applied to a `RigidBody` in the agent.
The **Discrete** action vector space defines its actions as tables. An action
given to the agent is an array of indeces into tables.
space.
You can try training with both settings to observe whether there is a
difference. (Set the `Vector Action Space Size` to 4 when using the discrete
space. You can try training with both settings to observe whether there is a
difference. (Set the `Vector Action Space Size` to 4 when using the discrete
The Agent is the actor that observes and takes actions in the environment.
In the 3D Balance Ball environment, the Agent components are placed on the
twelve Platform GameObjects. The base Agent object has a few properties that
affect its behavior:
The Agent is the actor that observes and takes actions in the environment. In
the 3D Balance Ball environment, the Agent components are placed on the twelve
Platform GameObjects. The base Agent object has a few properties that affect its
behavior:
* **Brain** — Every agent must have a Brain. The brain determines how an agent
makes decisions. All the agents in the 3D Balance Ball scene share the same
brain.
* **Brain** — Every agent must have a Brain. The brain determines how an agent
makes decisions. All the agents in the 3D Balance Ball scene share the same
brain.
observe its environment. 3D Balance Ball does not use camera observations.
* **Max Step** — Defines how many simulation steps can occur before the agent
decides it is done. In 3D Balance Ball, an agent restarts after 5000 steps.
observe its environment. 3D Balance Ball does not use camera observations.
* **Max Step** — Defines how many simulation steps can occur before the agent
decides it is done. In 3D Balance Ball, an agent restarts after 5000 steps.
3D Balance Ball sets this true so that the agent restarts after reaching the
**Max Step** count or after dropping the ball.
3D Balance Ball sets this true so that the agent restarts after reaching the
**Max Step** count or after dropping the ball.
Perhaps the more interesting aspect of an agent is the Agent subclass
implementation. When you create an agent, you must extend the base Agent class.
Perhaps the more interesting aspect of an agent is the Agent subclass
implementation. When you create an agent, you must extend the base Agent class.
* Agent.AgentReset() — Called when the Agent resets, including at the beginning
of a session. The Ball3DAgent class uses the reset function to reset the
platform and ball. The function randomizes the reset values so that the
training generalizes to more than a specific starting position and platform
attitude.
* Agent.CollectObservations() — Called every simulation step. Responsible for
collecting the agent's observations of the environment. Since the Brain
instance assigned to the agent is set to the continuous vector observation
space with a state size of 8, the `CollectObservations()` must call
`AddVectorObs` 8 times.
* Agent.AgentReset() — Called when the Agent resets, including at the beginning
of a session. The Ball3DAgent class uses the reset function to reset the
platform and ball. The function randomizes the reset values so that the
training generalizes to more than a specific starting position and platform
attitude.
* Agent.CollectObservations() — Called every simulation step. Responsible for
collecting the agent's observations of the environment. Since the Brain
instance assigned to the agent is set to the continuous vector observation
space with a state size of 8, the `CollectObservations()` must call
`AddVectorObs` 8 times.
by the brain. The Ball3DAgent example handles both the continuous and the
discrete action space types. There isn't actually much difference between the
two state types in this environment — both vector action spaces result in a
small change in platform rotation at each step. The `AgentAction()` function
assigns a reward to the agent; in this example, an agent receives a small
positive reward for each step it keeps the ball on the platform and a larger,
negative reward for dropping the ball. An agent is also marked as done when it
drops the ball so that it will reset with a new ball for the next simulation
step.
by the brain. The Ball3DAgent example handles both the continuous and the
discrete action space types. There isn't actually much difference between the
two state types in this environment — both vector action spaces result in a
small change in platform rotation at each step. The `AgentAction()` function
assigns a reward to the agent; in this example, an agent receives a small
positive reward for each step it keeps the ball on the platform and a larger,
negative reward for dropping the ball. An agent is also marked as done when it
drops the ball so that it will reset with a new ball for the next simulation
step.
## Training the Brain with Reinforcement Learning

In order to train an agent to correctly balance the ball, we will use a
Reinforcement Learning algorithm called Proximal Policy Optimization (PPO).
This is a method that has been shown to be safe, efficient, and more general
purpose than many other RL algorithms, as such we have chosen it as the
example algorithm for use with ML-Agents toolkit. For more information on PPO,
OpenAI has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/)
In order to train an agent to correctly balance the ball, we will use a
Reinforcement Learning algorithm called Proximal Policy Optimization (PPO). This
is a method that has been shown to be safe, efficient, and more general purpose
than many other RL algorithms, as such we have chosen it as the example
algorithm for use with ML-Agents toolkit. For more information on PPO, OpenAI
has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/)
To train the agents within the Ball Balance environment, we will be using the
python package. We have provided a convenient Python wrapper script called
`learn.py` which accepts arguments used to configure both training and inference
phases.
To train the agents within the Ball Balance environment, we will be using the python
package. We have provided a convenient Python wrapper script called `learn.py` which accepts arguments used to configure both training and inference phases.
We can use `run_id` to identify the experiment and create a folder where the model and summary statistics are stored. When using TensorBoard to observe the training statistics, it helps to set this to a sequential value
for each training run. In other words, "BalanceBall1" for the first run,
"BalanceBall2" or the second, and so on. If you don't, the summaries for
every training run are saved to the same directory and will all be included
on the same graph.
We can use `run_id` to identify the experiment and create a folder where the
model and summary statistics are stored. When using TensorBoard to observe the
training statistics, it helps to set this to a sequential value for each
training run. In other words, "BalanceBall1" for the first run, "BalanceBall2"
or the second, and so on. If you don't, the summaries for every training run are
saved to the same directory and will all be included on the same graph.
To summarize, go to your command line, enter the `ml-agents/python` directory and type:
To summarize, go to your command line, enter the `ml-agents` directory and type:
```
python3 learn.py --run-id=<run-identifier> --train
```shell
learn trainer_config.yaml --run-id=<run-identifier> --train
When the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen, you can press the :arrow_forward: button in Unity to start training in the Editor.
**Note**: If you're using Anaconda, don't forget to activate the ml-agents environment first.
When the message _"Start training by pressing the Play button in the Unity
Editor"_ is displayed on the screen, you can press the :arrow_forward: button in
Unity to start training in the Editor.
The `--train` flag tells the ML-Agents toolkit to run in training mode.
**Note**: If you're using Anaconda, don't forget to activate the ml-agents
environment first.
**Note**: You can train using an executable rather than the Editor. To do so, follow the intructions in
[Using an Execuatble](Learning-Environment-Executable.md).
The `--train` flag tells the ML-Agents toolkit to run in training mode.
**Note**: You can train using an executable rather than the Editor. To do so,
follow the intructions in [Using an
Execuatble](Learning-Environment-Executable.md).
Once you start training using `learn.py` in the way described in the previous section, the `ml-agents/python` folder will
contain a `summaries` directory. In order to observe the training process
in more detail, you can use TensorBoard. From the command line navigate to `ml-agents/python` folder and run:
Once you start training using `learn.py` in the way described in the previous
section, the `ml-agents` directory will contain a `summaries` directory. In
order to observe the training process in more detail, you can use TensorBoard.
From the command line run:
`tensorboard --logdir=summaries`
```shell
tensorboard --logdir=summaries
```
* Lesson - only interesting when performing
[curriculum training](Training-Curriculum-Learning.md).
This is not used in the 3D Balance Ball environment.
* Cumulative Reward - The mean cumulative episode reward over all agents.
Should increase during a successful training session.
* Entropy - How random the decisions of the model are. Should slowly decrease
during a successful training process. If it decreases too quickly, the `beta`
hyperparameter should be increased.
* Episode Length - The mean length of each episode in the environment for all
agents.
* Learning Rate - How large a step the training algorithm takes as it searches
for the optimal policy. Should decrease over time.
* Lesson - only interesting when performing [curriculum
training](Training-Curriculum-Learning.md). This is not used in the 3D Balance
Ball environment.
* Cumulative Reward - The mean cumulative episode reward over all agents. Should
increase during a successful training session.
* Entropy - How random the decisions of the model are. Should slowly decrease
during a successful training process. If it decreases too quickly, the `beta`
hyperparameter should be increased.
* Episode Length - The mean length of each episode in the environment for all
agents.
* Learning Rate - How large a step the training algorithm takes as it searches
for the optimal policy. Should decrease over time.
much the policy (process for deciding actions) is changing. The magnitude of
this should decrease during a successful training session.
* Value Estimate - The mean value estimate for all states visited by the agent.
Should increase during a successful training session.
much the policy (process for deciding actions) is changing. The magnitude of
this should decrease during a successful training session.
* Value Estimate - The mean value estimate for all states visited by the agent.
Should increase during a successful training session.
well the model is able to predict the value of each state. This should decrease
during a successful training session.
well the model is able to predict the value of each state. This should
decrease during a successful training session.
Once the training process completes, and the training process saves the model
(denoted by the `Saved Model` message) you can add it to the Unity project and
use it with agents having an **Internal** brain type.
**Note:** Do not just close the Unity Window once the `Saved Model` message appears. Either wait for the training process to close the window or press Ctrl+C at the command-line prompt. If you simply close the window manually, the .bytes file containing the trained model is not exported into the ml-agents folder.
Once the training process completes, and the training process saves the model
(denoted by the `Saved Model` message) you can add it to the Unity project and
use it with agents having an **Internal** brain type. **Note:** Do not just
close the Unity Window once the `Saved Model` message appears. Either wait for
the training process to close the window or press Ctrl+C at the command-line
prompt. If you simply close the window manually, the .bytes file containing the
trained model is not exported into the ml-agents folder.
Because TensorFlowSharp support is still experimental, it is disabled by
default. In order to enable it, you must follow these steps. Please note that
Because TensorFlowSharp support is still experimental, it is disabled by
default. In order to enable it, you must follow these steps. Please note that
To set up the TensorFlowSharp Support, follow [Setting up ML-Agents Toolkit within Unity](Basic-Guide.md#setting-up-ml-agents-within-unity) section.
of the Basic Guide page.
To set up the TensorFlowSharp Support, follow [Setting up ML-Agents Toolkit
within Unity](Basic-Guide.md#setting-up-ml-agents-within-unity) section. of the
Basic Guide page.
To embed the trained model into Unity, follow the later part of [Training the Brain with Reinforcement Learning](Basic-Guide.md#training-the-brain-with-reinforcement-learning) section of the Basic Buides page.
To embed the trained model into Unity, follow the later part of [Training the
Brain with Reinforcement
Learning](Basic-Guide.md#training-the-brain-with-reinforcement-learning) section
of the Basic Buides page.

68
docs/Glossary.md


# ML-Agents Toolkit Glossary
* **Academy** - Unity Component which controls timing, reset, and
training/inference settings of the environment.
* **Action** - The carrying-out of a decision on the part of an
agent within the environment.
* **Agent** - Unity Component which produces observations and
takes actions in the environment. Agents actions are determined
by decisions produced by a linked Brain.
* **Brain** - Unity Component which makes decisions for the agents
linked to it.
* **Decision** - The specification produced by a Brain for an action
to be carried out given an observation.
* **Editor** - The Unity Editor, which may include any pane
(e.g. Hierarchy, Scene, Inspector).
* **Environment** - The Unity scene which contains Agents, Academy,
and Brains.
* **FixedUpdate** - Unity method called each time the the game engine
is stepped. ML-Agents logic should be placed here.
* **Frame** - An instance of rendering the main camera for the
display. Corresponds to each `Update` call of the game engine.
* **Observation** - Partial information describing the state of the
environment available to a given agent. (e.g. Vector, Visual, Text)
* **Policy** - Function for producing decisions from observations.
* **Reward** - Signal provided at every step used to indicate
desirability of an agent’s action within the current state
of the environment.
* **State** - The underlying properties of the environment
(including all agents within it) at a given time.
* **Step** - Corresponds to each `FixedUpdate` call of the game engine.
Is the smallest atomic change to the state possible.
* **Update** - Unity function called each time a frame is rendered.
ML-Agents logic should not be placed here.
* **External Coordinator** - ML-Agents class responsible for
communication with outside processes (in this case, the Python API).
* **Trainer** - Python class which is responsible for training a given
external brain. Contains TensorFlow graph which makes decisions
for external brain.
* **Academy** - Unity Component which controls timing, reset, and
training/inference settings of the environment.
* **Action** - The carrying-out of a decision on the part of an agent within the
environment.
* **Agent** - Unity Component which produces observations and takes actions in
the environment. Agents actions are determined by decisions produced by a
linked Brain.
* **Brain** - Unity Component which makes decisions for the agents linked to it.
* **Decision** - The specification produced by a Brain for an action to be
carried out given an observation.
* **Editor** - The Unity Editor, which may include any pane (e.g. Hierarchy,
Scene, Inspector).
* **Environment** - The Unity scene which contains Agents, Academy, and Brains.
* **FixedUpdate** - Unity method called each time the the game engine is
stepped. ML-Agents logic should be placed here.
* **Frame** - An instance of rendering the main camera for the display.
Corresponds to each `Update` call of the game engine.
* **Observation** - Partial information describing the state of the environment
available to a given agent. (e.g. Vector, Visual, Text)
* **Policy** - Function for producing decisions from observations.
* **Reward** - Signal provided at every step used to indicate desirability of an
agent’s action within the current state of the environment.
* **State** - The underlying properties of the environment (including all agents
within it) at a given time.
* **Step** - Corresponds to each `FixedUpdate` call of the game engine. Is the
smallest atomic change to the state possible.
* **Update** - Unity function called each time a frame is rendered. ML-Agents
logic should not be placed here.
* **External Coordinator** - ML-Agents class responsible for communication with
outside processes (in this case, the Python API).
* **Trainer** - Python class which is responsible for training a given external
brain. Contains TensorFlow graph which makes decisions for external brain.

52
docs/Installation.md


_Linux Build Support_ component when installing Unity.
<p align="center">
<img src="images/unity_linux_build_support.png"
alt="Linux Build Support"
<img src="images/unity_linux_build_support.png"
alt="Linux Build Support"
Once installed, you will want to clone the ML-Agents Toolkit GitHub repository.
Once installed, you will want to clone the ML-Agents Toolkit GitHub repository.
The `unity-environment` directory in this repository contains the Unity Assets
The `MLAgentsSDK` directory in this repository contains the Unity Assets
Both directories are located at the root of the repository.
Both directories are located at the root of the repository.
the dependencies listed in the [requirements file](../python/requirements.txt).
the dependencies listed in the [requirements file](../requirements.txt).
- [TensorFlow](Background-TensorFlow.md)
- [Jupyter](Background-Jupyter.md)
- [TensorFlow](Background-TensorFlow.md)
- [Jupyter](Background-Jupyter.md)
### NOTES
**NOTES**
- If you are using Anaconda and are having trouble with TensorFlow, please see the following [note](https://www.tensorflow.org/install/install_mac#installing_with_anaconda) on how to install TensorFlow in an Anaconda environment.
- If you are using Anaconda and are having trouble with TensorFlow, please see
the following
[note](https://www.tensorflow.org/install/install_mac#installing_with_anaconda)
on how to install TensorFlow in an Anaconda environment.
If you are a Windows user who is new to Python and TensorFlow, follow [this guide](Installation-Windows.md) to set up your Python environment.
If you are a Windows user who is new to Python and TensorFlow, follow [this
guide](Installation-Windows.md) to set up your Python environment.
[Download](https://www.python.org/downloads/) and install Python 3 if you do not already have it.
[Download](https://www.python.org/downloads/) and install Python 3 if you do not
already have it.
If your Python environment doesn't include `pip`, see these
If your Python environment doesn't include `pip`, see these
To install dependencies, **go into the `python` subdirectory** of the repository,
and run from the command line:
To install dependencies, **go into the `python` subdirectory** of the
repository, and run from the command line:
If you'd like to use Docker for ML-Agents, please follow
[this guide](Using-Docker.md).
If you'd like to use Docker for ML-Agents, please follow
[this guide](Using-Docker.md).
The [Basic Guide](Basic-Guide.md) page contains several short
tutorials on setting up the ML-Agents toolkit within Unity, running a pre-trained model, in
The [Basic Guide](Basic-Guide.md) page contains several short tutorials on
setting up the ML-Agents toolkit within Unity, running a pre-trained model, in
If you run into any problems regarding ML-Agents, refer to our [FAQ](FAQ.md) and our [Limitations](Limitations.md) pages. If you can't find anything please
If you run into any problems regarding ML-Agents, refer to our [FAQ](FAQ.md) and
our [Limitations](Limitations.md) pages. If you can't find anything please
make sure to cite relevant information on OS, Python version, and exact error
message (whenever possible).
make sure to cite relevant information on OS, Python version, and exact error
message (whenever possible).

12
docs/Learning-Environment-Create-New.md


2. In a file system window, navigate to the folder containing your cloned ML-Agents repository.
3. Drag the `ML-Agents` folder from `unity-environments/Assets` to the Unity Editor Project window.
3. Drag the `ML-Agents` folder from `MLAgentsSDK/Assets` to the Unity Editor Project window.
Your Unity **Project** window should contain the following assets:

Press **Play** to run the scene and use the WASD keys to move the agent around the platform. Make sure that there are no errors displayed in the Unity editor Console window and that the agent resets when it reaches its target or falls from the platform. Note that for more involved debugging, the ML-Agents SDK includes a convenient Monitor class that you can use to easily display agent status information in the Game window.
One additional test you can perform is to first ensure that your environment
and the Python API work as expected using the `python/Basics`
[Jupyter notebook](Background-Jupyter.md). Within `Basics`, be sure to set
`env_name` to the name of the environment file you specify when building
this environment.
One additional test you can perform is to first ensure that your environment and
the Python API work as expected using the `notebooks/getting-started.ipynb`
[Jupyter notebook](Background-Jupyter.md). Within the notebook, be sure to set
`env_name` to the name of the environment file you specify when building this
environment.
Now you can train the Agent. To get ready for training, you must first to change the **Brain Type** from **Player** to **External**. From there, the process is the same as described in [Training ML-Agents](Training-ML-Agents.md).

2
docs/Learning-Environment-Examples.md


The Unity ML-Agents toolkit contains an expanding set of example environments which
demonstrate various features of the platform. Environments are located in
`unity-environment/Assets/ML-Agents/Examples` and summarized below.
`MLAgentsSDK/Assets/ML-Agents/Examples` and summarized below.
Additionally, our
[first ML Challenge](https://connect.unity.com/challenges/ml-agents-1)
contains environments created by the community.

6
docs/Learning-Environment-Executable.md


1. Launch Unity.
2. On the Projects dialog, choose the **Open** option at the top of the window.
3. Using the file dialog that opens, locate the `unity-environment` folder
3. Using the file dialog that opens, locate the `MLAgentsSDK` folder
within the ML-Agents project and click **Open**.
4. In the **Project** window, navigate to the folder
`Assets/ML-Agents/Examples/3DBall/`.

![Training running](images/training-running.png)
You can press Ctrl+C to stop the training, and your trained model will be at `ml-agents/python/models/<run-identifier>/<env_name>_<run-identifier>.bytes`, which corresponds to your model's latest checkpoint. You can now embed this trained model into your internal brain by following the steps below:
You can press Ctrl+C to stop the training, and your trained model will be at `models/<run-identifier>/<env_name>_<run-identifier>.bytes`, which corresponds to your model's latest checkpoint. You can now embed this trained model into your internal brain by following the steps below:
`unity-environment/Assets/ML-Agents/Examples/3DBall/TFModels/`.
`MLAgentsSDK/Assets/ML-Agents/Examples/3DBall/TFModels/`.
2. Open the Unity Editor, and select the **3DBall** scene as described above.
3. Select the **Ball3DBrain** object from the Scene hierarchy.
4. Change the **Type of Brain** to **Internal**.

685
docs/ML-Agents-Overview.md


# ML-Agents Toolkit Overview
**The Unity Machine Learning Agents Toolkit** (ML-Agents Toolkit) is an open-source Unity plugin
that enables games and simulations to serve as environments for training
intelligent agents. Agents can be trained using reinforcement learning,
imitation learning, neuroevolution, or other machine learning methods through
a simple-to-use Python API. We also provide implementations (based on
TensorFlow) of state-of-the-art algorithms to enable game developers
and hobbyists to easily train intelligent agents for 2D, 3D and VR/AR games.
These trained agents can be used for multiple purposes, including
**The Unity Machine Learning Agents Toolkit** (ML-Agents Toolkit) is an
open-source Unity plugin that enables games and simulations to serve as
environments for training intelligent agents. Agents can be trained using
reinforcement learning, imitation learning, neuroevolution, or other machine
learning methods through a simple-to-use Python API. We also provide
implementations (based on TensorFlow) of state-of-the-art algorithms to enable
game developers and hobbyists to easily train intelligent agents for 2D, 3D and
VR/AR games. These trained agents can be used for multiple purposes, including
design decisions pre-release. The ML-Agents toolkit is mutually beneficial for both game
developers and AI researchers as it provides a central platform where advances
in AI can be evaluated on Unity’s rich environments and then made accessible
to the wider research and game developer communities.
design decisions pre-release. The ML-Agents toolkit is mutually beneficial for
both game developers and AI researchers as it provides a central platform where
advances in AI can be evaluated on Unity’s rich environments and then made
accessible to the wider research and game developer communities.
Depending on your background (i.e. researcher, game developer, hobbyist),
you may have very different questions on your mind at the moment.
To make your transition to the ML-Agents toolkit easier, we provide several background
pages that include overviews and helpful resources on the
[Unity Engine](Background-Unity.md),
[machine learning](Background-Machine-Learning.md) and
[TensorFlow](Background-TensorFlow.md). We **strongly** recommend browsing
the relevant background pages if you're not familiar with a Unity scene,
basic machine learning concepts or have not previously heard of TensorFlow.
Depending on your background (i.e. researcher, game developer, hobbyist), you
may have very different questions on your mind at the moment. To make your
transition to the ML-Agents toolkit easier, we provide several background pages
that include overviews and helpful resources on the [Unity
Engine](Background-Unity.md), [machine learning](Background-Machine-Learning.md)
and [TensorFlow](Background-TensorFlow.md). We **strongly** recommend browsing
the relevant background pages if you're not familiar with a Unity scene, basic
machine learning concepts or have not previously heard of TensorFlow.
components, different training modes and scenarios. By the end of it, you
should have a good sense of _what_ the ML-Agents toolkit allows you to do. The subsequent
documentation pages provide examples of _how_ to use ML-Agents.
components, different training modes and scenarios. By the end of it, you should
have a good sense of _what_ the ML-Agents toolkit allows you to do. The
subsequent documentation pages provide examples of _how_ to use ML-Agents.
hypothetical, running example throughout. We will explore the
problem of training the behavior of a non-playable character (NPC) in a game.
(An NPC is a game character that is never controlled by a human player and
its behavior is pre-defined by the game developer.) More specifically, let's
assume we're building a multi-player, war-themed game in which players control
the soldiers. In this game, we have a single NPC who serves as a medic, finding
and reviving wounded players. Lastly, let us assume that there
are two teams, each with five players and one NPC medic.
hypothetical, running example throughout. We will explore the problem of
training the behavior of a non-playable character (NPC) in a game. (An NPC is a
game character that is never controlled by a human player and its behavior is
pre-defined by the game developer.) More specifically, let's assume we're
building a multi-player, war-themed game in which players control the soldiers.
In this game, we have a single NPC who serves as a medic, finding and reviving
wounded players. Lastly, let us assume that there are two teams, each with five
players and one NPC medic.
location. Second, it needs to be aware of which of its team members are
injured and require assistance. In the case of multiple injuries, it needs to
assess the degree of injury and decide who to help first. Lastly, a good
medic will always place itself in a position where it can quickly help its
team members. Factoring in all of these traits means that at every instance,
the medic needs to measure several attributes of the environment (e.g.
position of team members, position of enemies, which of its team members are
injured and to what degree) and then decide on an action (e.g. hide from enemy
fire, move to help one of its members). Given the large number of settings of
the environment and the large number of actions that the medic can take,
defining and implementing such complex behaviors by hand is challenging and
prone to errors.
location. Second, it needs to be aware of which of its team members are injured
and require assistance. In the case of multiple injuries, it needs to assess the
degree of injury and decide who to help first. Lastly, a good medic will always
place itself in a position where it can quickly help its team members. Factoring
in all of these traits means that at every instance, the medic needs to measure
several attributes of the environment (e.g. position of team members, position
of enemies, which of its team members are injured and to what degree) and then
decide on an action (e.g. hide from enemy fire, move to help one of its
members). Given the large number of settings of the environment and the large
number of actions that the medic can take, defining and implementing such
complex behaviors by hand is challenging and prone to errors.
With ML-Agents, it is possible to _train_ the behaviors of such NPCs
(called **agents**) using a variety of methods. The basic idea is quite simple.
We need to define three entities at every moment of the game
(called **environment**):
With ML-Agents, it is possible to _train_ the behaviors of such NPCs (called
**agents**) using a variety of methods. The basic idea is quite simple. We need
to define three entities at every moment of the game (called **environment**):
Observations can be numeric and/or visual. Numeric observations measure
attributes of the environment from the point of view of the agent. For
our medic this would be attributes of the battlefield that are visible to it.
For most interesting environments, an agent will require
several continuous numeric observations.
Visual observations, on the other hand, are images generated from the cameras
attached to the agent and represent what the agent is seeing at that point
in time. It is common to confuse an agent's observation with the environment
(or game) **state**. The environment state represents information about the
entire scene containing all the game characters. The agents observation,
however, only contains information that the agent is aware of and is typically
a subset of the environment state. For example, the medic observation cannot
include information about an enemy in hiding that the medic is unaware of.
- **Actions** - what actions the medic can take. Similar
to observations, actions can either be continuous or discrete depending
on the complexity of the environment and agent. In the case of the medic,
if the environment is a simple grid world where only their location matters,
then a discrete action taking on one of four values (north, south, east, west)
suffices. However, if the environment is more complex and the medic can move
freely then using two continuous actions (one for direction and another
for speed) is more appropriate.
Observations can be numeric and/or visual. Numeric observations measure
attributes of the environment from the point of view of the agent. For our
medic this would be attributes of the battlefield that are visible to it. For
most interesting environments, an agent will require several continuous
numeric observations. Visual observations, on the other hand, are images
generated from the cameras attached to the agent and represent what the agent
is seeing at that point in time. It is common to confuse an agent's
observation with the environment (or game) **state**. The environment state
represents information about the entire scene containing all the game
characters. The agents observation, however, only contains information that
the agent is aware of and is typically a subset of the environment state. For
example, the medic observation cannot include information about an enemy in
hiding that the medic is unaware of.
- **Actions** - what actions the medic can take. Similar to observations,
actions can either be continuous or discrete depending on the complexity of
the environment and agent. In the case of the medic, if the environment is a
simple grid world where only their location matters, then a discrete action
taking on one of four values (north, south, east, west) suffices. However, if
the environment is more complex and the medic can move freely then using two
continuous actions (one for direction and another for speed) is more
appropriate.
Note that the reward signal need not be
provided at every moment, but only when the medic performs an action that is
good or bad. For example, it can receive a large negative reward if it dies,
a modest positive reward whenever it revives a wounded team member, and a
modest negative reward when a wounded team member dies due to lack of
assistance. Note that the reward signal is how the objectives of the task
are communicated to the agent, so they need to be set up in a manner where
maximizing reward generates the desired optimal behavior.
Note that the reward signal need not be provided at every moment, but only
when the medic performs an action that is good or bad. For example, it can
receive a large negative reward if it dies, a modest positive reward whenever
it revives a wounded team member, and a modest negative reward when a wounded
team member dies due to lack of assistance. Note that the reward signal is how
the objectives of the task are communicated to the agent, so they need to be
set up in a manner where maximizing reward generates the desired optimal
behavior.
After defining these three entities (the building blocks of a
**reinforcement learning task**),
we can now _train_ the medic's behavior. This is achieved by simulating the
environment for many trials where the medic, over time, learns what is the
optimal action to take for every observation it measures by maximizing
its future reward. The key is that by learning the actions that maximize its
reward, the medic is learning the behaviors that make it a good medic (i.e.
one who saves the most number of lives). In **reinforcement learning**
terminology, the behavior that is learned is called a **policy**, which is
essentially a (optimal) mapping from observations to actions. Note that
After defining these three entities (the building blocks of a **reinforcement
learning task**), we can now _train_ the medic's behavior. This is achieved by
simulating the environment for many trials where the medic, over time, learns
what is the optimal action to take for every observation it measures by
maximizing its future reward. The key is that by learning the actions that
maximize its reward, the medic is learning the behaviors that make it a good
medic (i.e. one who saves the most number of lives). In **reinforcement
learning** terminology, the behavior that is learned is called a **policy**,
which is essentially a (optimal) mapping from observations to actions. Note that
**training phase**, while playing the game with an NPC that is using its
learned policy is called the **inference phase**.
**training phase**, while playing the game with an NPC that is using its learned
policy is called the **inference phase**.
The ML-Agents toolkit provides all the necessary tools for using Unity as the simulation
engine for learning the policies of different objects in a Unity environment.
In the next few sections, we discuss how the ML-Agents toolkit achieves this and what
features it provides.
The ML-Agents toolkit provides all the necessary tools for using Unity as the
simulation engine for learning the policies of different objects in a Unity
environment. In the next few sections, we discuss how the ML-Agents toolkit
achieves this and what features it provides.
The ML-Agents toolkit is a Unity plugin that contains three high-level components:
* **Learning Environment** - which contains the Unity scene and all the game
characters.
* **Python API** - which contains all the machine learning algorithms that are
used for training (learning a behavior or policy). Note that, unlike
the Learning Environment, the Python API is not part of Unity, but lives
outside and communicates with Unity through the External Communicator.
* **External Communicator** - which connects the Learning Environment
with the Python API. It lives within the Learning Environment.
The ML-Agents toolkit is a Unity plugin that contains three high-level
components:
- **Learning Environment** - which contains the Unity scene and all the game
characters.
- **Python API** - which contains all the machine learning algorithms that are
used for training (learning a behavior or policy). Note that, unlike the
Learning Environment, the Python API is not part of Unity, but lives outside
and communicates with Unity through the External Communicator.
- **External Communicator** - which connects the Learning Environment with the
Python API. It lives within the Learning Environment.
<p align="center">
<img src="images/learning_environment_basic.png"

_Simplified block diagram of ML-Agents._
The Learning Environment contains three additional components that help
organize the Unity scene:
* **Agents** - which is attached to a Unity GameObject (any character within a
scene) and handles generating its observations, performing the actions it
receives and assigning a reward (positive / negative) when appropriate.
Each Agent is linked to exactly one Brain.
* **Brains** - which encapsulates the logic for making decisions for the Agent.
In essence, the Brain is what holds on to the policy for each Agent and
determines which actions the Agent should take at each instance. More
specifically, it is the component that receives the observations and rewards
from the Agent and returns an action.
* **Academy** - which orchestrates the observation and decision making process.
Within the Academy, several environment-wide parameters such as the rendering
quality and the speed at which the environment is run can be specified. The
External Communicator lives within the Academy.
organize the Unity scene:
Every Learning Environment will always have one global Academy and one Agent
for every character in the scene. While each Agent must be linked to a Brain,
it is possible for Agents that have similar observations and actions to be
linked to the same Brain. In our sample game, we have two teams each with
their own medic. Thus we will have two Agents in our Learning Environment,
one for each medic, but both of these medics can be linked to the same Brain.
Note that these two medics are linked to the same Brain because their _space_
of observations and actions are similar. This does not mean that at each
instance they will have identical observation and action _values_. In other
words, the Brain defines the space of all possible observations and actions,
while the Agents connected to it (in this case the medics) can each have
their own, unique observation and action values. If we expanded our game
to include tank driver NPCs, then the Agent attached to those characters
cannot share a Brain with the Agent linked to the medics (medics and drivers
have different actions).
- **Agents** - which is attached to a Unity GameObject (any character within a
scene) and handles generating its observations, performing the actions it
receives and assigning a reward (positive / negative) when appropriate. Each
Agent is linked to exactly one Brain.
- **Brains** - which encapsulates the logic for making decisions for the Agent.
In essence, the Brain is what holds on to the policy for each Agent and
determines which actions the Agent should take at each instance. More
specifically, it is the component that receives the observations and rewards
from the Agent and returns an action.
- **Academy** - which orchestrates the observation and decision making process.
Within the Academy, several environment-wide parameters such as the rendering
quality and the speed at which the environment is run can be specified. The
External Communicator lives within the Academy.
Every Learning Environment will always have one global Academy and one Agent for
every character in the scene. While each Agent must be linked to a Brain, it is
possible for Agents that have similar observations and actions to be linked to
the same Brain. In our sample game, we have two teams each with their own medic.
Thus we will have two Agents in our Learning Environment, one for each medic,
but both of these medics can be linked to the same Brain. Note that these two
medics are linked to the same Brain because their _space_ of observations and
actions are similar. This does not mean that at each instance they will have
identical observation and action _values_. In other words, the Brain defines the
space of all possible observations and actions, while the Agents connected to it
(in this case the medics) can each have their own, unique observation and action
values. If we expanded our game to include tank driver NPCs, then the Agent
attached to those characters cannot share a Brain with the Agent linked to the
medics (medics and drivers have different actions).
<img src="images/learning_environment_example.png"
alt="Example ML-Agents Scene Block Diagram"
<img src="images/learning_environment_example.png"
alt="Example ML-Agents Scene Block Diagram"
We have yet to discuss how the ML-Agents toolkit trains behaviors, and what role the
Python API and External Communicator play. Before we dive into those details,
let's summarize the earlier components. Each character is attached to an Agent,
and each Agent is linked to a Brain. The Brain receives observations and
rewards from the Agent and returns actions. The Academy ensures that all the
We have yet to discuss how the ML-Agents toolkit trains behaviors, and what role
the Python API and External Communicator play. Before we dive into those
details, let's summarize the earlier components. Each character is attached to
an Agent, and each Agent is linked to a Brain. The Brain receives observations
and rewards from the Agent and returns actions. The Academy ensures that all the
* **External** - where decisions are made using the Python API. Here, the
observations and rewards collected by the Brain are forwarded to the Python
API through the External Communicator. The Python API then returns the
corresponding action that needs to be taken by the Agent.
* **Internal** - where decisions are made using an embedded
[TensorFlow](Background-TensorFlow.md) model.
The embedded TensorFlow model represents a learned policy and the Brain
directly uses this model to determine the action for each Agent.
* **Player** - where decisions are made using real input from a keyboard or
controller. Here, a human player is controlling the Agent and the observations
and rewards collected by the Brain are not used to control the Agent.
* **Heuristic** - where decisions are made using hard-coded behavior. This
resembles how most character behaviors are currently defined and can be
helpful for debugging or comparing how an Agent with hard-coded rules compares
to an Agent whose behavior has been trained. In our example, once we have
trained a Brain for the medics we could assign a medic on one team to the
trained Brain and assign the medic on the other team a Heuristic Brain
with hard-coded behaviors. We can then evaluate which medic is more effective.
- **External** - where decisions are made using the Python API. Here, the
observations and rewards collected by the Brain are forwarded to the Python
API through the External Communicator. The Python API then returns the
corresponding action that needs to be taken by the Agent.
- **Internal** - where decisions are made using an embedded
[TensorFlow](Background-TensorFlow.md) model. The embedded TensorFlow model
represents a learned policy and the Brain directly uses this model to
determine the action for each Agent.
- **Player** - where decisions are made using real input from a keyboard or
controller. Here, a human player is controlling the Agent and the observations
and rewards collected by the Brain are not used to control the Agent.
- **Heuristic** - where decisions are made using hard-coded behavior. This
resembles how most character behaviors are currently defined and can be
helpful for debugging or comparing how an Agent with hard-coded rules compares
to an Agent whose behavior has been trained. In our example, once we have
trained a Brain for the medics we could assign a medic on one team to the
trained Brain and assign the medic on the other team a Heuristic Brain with
hard-coded behaviors. We can then evaluate which medic is more effective.
As currently described, it may seem that the External Communicator
and Python API are only leveraged by the External Brain. This is not true.
It is possible to configure the Internal, Player and Heuristic Brains to
also send the observations, rewards and actions to the Python API through
the External Communicator (a feature called _broadcasting_). As we will see
shortly, this enables additional training modes.
As currently described, it may seem that the External Communicator and Python
API are only leveraged by the External Brain. This is not true. It is possible
to configure the Internal, Player and Heuristic Brains to also send the
observations, rewards and actions to the Python API through the External
Communicator (a feature called _broadcasting_). As we will see shortly, this
enables additional training modes.
<img src="images/learning_environment.png"
alt="ML-Agents Scene Block Diagram"
<img src="images/learning_environment.png"
alt="ML-Agents Scene Block Diagram"
border="10" />
</p>

### Built-in Training and Inference
As mentioned previously, the ML-Agents toolkit ships with several implementations of
state-of-the-art algorithms for training intelligent agents. In this mode, the
Brain type is set to External during training and Internal during inference.
More specifically, during training, all the medics in the scene send their
observations to the Python API through the External Communicator (this is the
behavior with an External Brain). The Python API processes these observations
and sends back actions for each medic to take. During training these actions
are mostly exploratory to help the Python API learn the best policy for each
medic. Once training concludes, the learned policy for each medic can be
exported. Given that all our implementations are based on TensorFlow, the
learned policy is just a TensorFlow model file. Then during the inference
phase, we switch the Brain type to Internal and include the TensorFlow model
generated from the training phase. Now during the inference phase, the medics
still continue to generate their observations, but instead of being sent to
the Python API, they will be fed into their (internal, embedded) model to
generate the _optimal_ action for each medic to take at every point in time.
As mentioned previously, the ML-Agents toolkit ships with several
implementations of state-of-the-art algorithms for training intelligent agents.
In this mode, the Brain type is set to External during training and Internal
during inference. More specifically, during training, all the medics in the
scene send their observations to the Python API through the External
Communicator (this is the behavior with an External Brain). The Python API
processes these observations and sends back actions for each medic to take.
During training these actions are mostly exploratory to help the Python API
learn the best policy for each medic. Once training concludes, the learned
policy for each medic can be exported. Given that all our implementations are
based on TensorFlow, the learned policy is just a TensorFlow model file. Then
during the inference phase, we switch the Brain type to Internal and include the
TensorFlow model generated from the training phase. Now during the inference
phase, the medics still continue to generate their observations, but instead of
being sent to the Python API, they will be fed into their (internal, embedded)
model to generate the _optimal_ action for each medic to take at every point in
time.
To summarize: our built-in implementations are based on TensorFlow, thus,
during training the Python API uses the observations it receives to learn
a TensorFlow model. This model is then embedded within the Internal Brain
during inference to generate the optimal actions for all Agents linked to
that Brain. **Note that our Internal Brain is currently experimental as it
is limited to TensorFlow models and leverages the third-party
[TensorFlowSharp](https://github.com/migueldeicaza/TensorFlowSharp)
library.**
To summarize: our built-in implementations are based on TensorFlow, thus, during
training the Python API uses the observations it receives to learn a TensorFlow
model. This model is then embedded within the Internal Brain during inference to
generate the optimal actions for all Agents linked to that Brain. **Note that
our Internal Brain is currently experimental as it is limited to TensorFlow
models and leverages the third-party
[TensorFlowSharp](https://github.com/migueldeicaza/TensorFlowSharp) library.**
The
[Getting Started with the 3D Balance Ball Example](Getting-Started-with-Balance-Ball.md)

In the previous mode, the External Brain type was used for training
to generate a TensorFlow model that the Internal Brain type can understand
and use. However, any user of the ML-Agents toolkit can leverage their own algorithms
for both training and inference. In this case, the Brain type would be set
to External for both training and inferences phases and the behaviors of
all the Agents in the scene will be controlled within Python.
In the previous mode, the External Brain type was used for training to generate
a TensorFlow model that the Internal Brain type can understand and use. However,
any user of the ML-Agents toolkit can leverage their own algorithms for both
training and inference. In this case, the Brain type would be set to External
for both training and inferences phases and the behaviors of all the Agents in
the scene will be controlled within Python.
We do not currently have a tutorial highlighting this mode, but you can
learn more about the Python API [here](Python-API.md).

This mode is an extension of _Built-in Training and Inference_, and
is particularly helpful when training intricate behaviors for complex
environments. Curriculum learning is a way of training a machine learning
model where more difficult aspects of a problem are gradually introduced in
such a way that the model is always optimally challenged. This idea has been
around for a long time, and it is how we humans typically learn. If you
imagine any childhood primary school education, there is an ordering of
classes and topics. Arithmetic is taught before algebra, for example.
Likewise, algebra is taught before calculus. The skills and knowledge learned
in the earlier subjects provide a scaffolding for later lessons. The same
principle can be applied to machine learning, where training on easier tasks
can provide a scaffolding for harder tasks in the future.
This mode is an extension of _Built-in Training and Inference_, and is
particularly helpful when training intricate behaviors for complex environments.
Curriculum learning is a way of training a machine learning model where more
difficult aspects of a problem are gradually introduced in such a way that the
model is always optimally challenged. This idea has been around for a long time,
and it is how we humans typically learn. If you imagine any childhood primary
school education, there is an ordering of classes and topics. Arithmetic is
taught before algebra, for example. Likewise, algebra is taught before calculus.
The skills and knowledge learned in the earlier subjects provide a scaffolding
for later lessons. The same principle can be applied to machine learning, where
training on easier tasks can provide a scaffolding for harder tasks in the
future.
<img src="images/math.png"
alt="Example Math Curriculum"
<img src="images/math.png"
alt="Example Math Curriculum"
_Example of a mathematics curriculum. Lessons progress from simpler topics to more
complex ones, with each building on the last._
_Example of a mathematics curriculum. Lessons progress from simpler topics to
more complex ones, with each building on the last._
When we think about how reinforcement learning actually works, the
learning signal is reward received occasionally throughout training.
The starting point when training an agent to accomplish this task will be a
random policy. That starting policy will have the agent running in circles,
and will likely never, or very rarely achieve the reward for complex
environments. Thus by simplifying the environment at the beginning of training,
we allow the agent to quickly update the random policy to a more meaningful
one that is successively improved as the environment gradually increases in
complexity. In our example, we can imagine first training the medic when each
team only contains one player, and then iteratively increasing the number of
players (i.e. the environment complexity). The ML-Agents toolkit supports setting
custom environment parameters within the Academy. This allows
elements of the environment related to difficulty or complexity to be
dynamically adjusted based on training progress.
When we think about how reinforcement learning actually works, the learning
signal is reward received occasionally throughout training. The starting point
when training an agent to accomplish this task will be a random policy. That
starting policy will have the agent running in circles, and will likely never,
or very rarely achieve the reward for complex environments. Thus by simplifying
the environment at the beginning of training, we allow the agent to quickly
update the random policy to a more meaningful one that is successively improved
as the environment gradually increases in complexity. In our example, we can
imagine first training the medic when each team only contains one player, and
then iteratively increasing the number of players (i.e. the environment
complexity). The ML-Agents toolkit supports setting custom environment
parameters within the Academy. This allows elements of the environment related
to difficulty or complexity to be dynamically adjusted based on training
progress.
The [Training with Curriculum Learning](Training-Curriculum-Learning.md)
tutorial covers this training mode with the **Wall Area** sample environment.

It is often more intuitive to simply demonstrate the behavior we
want an agent to perform, rather than attempting to have it learn via
trial-and-error methods. For example, instead of training the medic by
setting up its reward function, this mode allows providing real examples from
a game controller on how the medic should behave. More specifically,
in this mode, the Brain type during training is set to Player and all the
actions performed with the controller (in addition to the agent observations)
will be recorded and sent to the Python API. The imitation learning algorithm
will then use these pairs of observations and actions from the human player
to learn a policy. [Video Link](https://youtu.be/kpb8ZkMBFYs).
It is often more intuitive to simply demonstrate the behavior we want an agent
to perform, rather than attempting to have it learn via trial-and-error methods.
For example, instead of training the medic by setting up its reward function,
this mode allows providing real examples from a game controller on how the medic
should behave. More specifically, in this mode, the Brain type during training
is set to Player and all the actions performed with the controller (in addition
to the agent observations) will be recorded and sent to the Python API. The
imitation learning algorithm will then use these pairs of observations and
actions from the human player to learn a policy. [Video
Link](https://youtu.be/kpb8ZkMBFYs).
The [Training with Imitation Learning](Training-Imitation-Learning.md) tutorial covers this
training mode with the **Banana Collector** sample environment.
The [Training with Imitation Learning](Training-Imitation-Learning.md) tutorial
covers this training mode with the **Banana Collector** sample environment.
While the discussion so-far has mostly focused on training a single agent, with
ML-Agents, several training scenarios are possible.
We are excited to see what kinds of novel and fun environments the community
creates. For those new to training intelligent agents, below are a few examples
that can serve as inspiration:
* Single-Agent. A single Agent linked to a single Brain, with its own reward
signal. The traditional way of training an agent. An example is any
single-player game, such as Chicken.
[Video Link](https://www.youtube.com/watch?v=fiQsmdwEGT8&feature=youtu.be).
* Simultaneous Single-Agent. Multiple independent Agents with independent
reward signals linked to a single Brain. A parallelized version of the
traditional training scenario, which can speed-up and stabilize the training
process. Helpful when you have multiple versions of the same character in an
environment who should learn similar behaviors. An example might be training
a dozen robot-arms to each open a door simultaneously.
[Video Link](https://www.youtube.com/watch?v=fq0JBaiCYNA).
* Adversarial Self-Play. Two interacting Agents with inverse reward signals
linked to a single Brain. In two-player games, adversarial self-play can allow
an agent to become increasingly more skilled, while always having the perfectly
matched opponent: itself. This was the strategy employed when training AlphaGo,
and more recently used by OpenAI to train a human-beating 1-vs-1 Dota 2 agent.
* Cooperative Multi-Agent. Multiple interacting Agents with a shared reward
signal linked to either a single or multiple different Brains. In this
scenario, all agents must work together to accomplish a task that cannot be
done alone. Examples include environments where each agent only has access to
partial information, which needs to be shared in order to accomplish the task
or collaboratively solve a puzzle.
* Competitive Multi-Agent. Multiple interacting Agents with inverse reward
signals linked to either a single or multiple different Brains. In this
scenario, agents must compete with one another to either win a competition,
or obtain some limited set of resources. All team sports fall into this
scenario.
* Ecosystem. Multiple interacting Agents with independent reward signals
linked to either a single or multiple different Brains. This scenario can be
thought of as creating a small world in which animals with different goals all
interact, such as a savanna in which there might be zebras, elephants and
giraffes, or an autonomous driving simulation within an urban environment.
While the discussion so-far has mostly focused on training a single agent, with
ML-Agents, several training scenarios are possible. We are excited to see what
kinds of novel and fun environments the community creates. For those new to
training intelligent agents, below are a few examples that can serve as
inspiration:
- Single-Agent. A single Agent linked to a single Brain, with its own reward
signal. The traditional way of training an agent. An example is any
single-player game, such as Chicken. [Video
Link](https://www.youtube.com/watch?v=fiQsmdwEGT8&feature=youtu.be).
- Simultaneous Single-Agent. Multiple independent Agents with independent reward
signals linked to a single Brain. A parallelized version of the traditional
training scenario, which can speed-up and stabilize the training process.
Helpful when you have multiple versions of the same character in an
environment who should learn similar behaviors. An example might be training a
dozen robot-arms to each open a door simultaneously. [Video
Link](https://www.youtube.com/watch?v=fq0JBaiCYNA).
- Adversarial Self-Play. Two interacting Agents with inverse reward signals
linked to a single Brain. In two-player games, adversarial self-play can allow
an agent to become increasingly more skilled, while always having the
perfectly matched opponent: itself. This was the strategy employed when
training AlphaGo, and more recently used by OpenAI to train a human-beating
1-vs-1 Dota 2 agent.
- Cooperative Multi-Agent. Multiple interacting Agents with a shared reward
signal linked to either a single or multiple different Brains. In this
scenario, all agents must work together to accomplish a task that cannot be
done alone. Examples include environments where each agent only has access to
partial information, which needs to be shared in order to accomplish the task
or collaboratively solve a puzzle.
- Competitive Multi-Agent. Multiple interacting Agents with inverse reward
signals linked to either a single or multiple different Brains. In this
scenario, agents must compete with one another to either win a competition, or
obtain some limited set of resources. All team sports fall into this scenario.
- Ecosystem. Multiple interacting Agents with independent reward signals linked
to either a single or multiple different Brains. This scenario can be thought
of as creating a small world in which animals with different goals all
interact, such as a savanna in which there might be zebras, elephants and
giraffes, or an autonomous driving simulation within an urban environment.
Beyond the flexible training scenarios available, the ML-Agents toolkit includes
Beyond the flexible training scenarios available, the ML-Agents toolkit includes
* **On Demand Decision Making** - With the ML-Agents toolkit it is possible to have agents
request decisions only when needed as opposed to requesting decisions at
every step of the environment. This enables training of turn based games,
games where agents
must react to events or games where agents can take actions of variable
duration. Switching between decision taking at every step and
on-demand-decision is one button click away. You can learn more about the
on-demand-decision feature
[here](Learning-Environment-Design-Agents.md#on-demand-decision-making).
- **On Demand Decision Making** - With the ML-Agents toolkit it is possible to
have agents request decisions only when needed as opposed to requesting
decisions at every step of the environment. This enables training of turn
based games, games where agents must react to events or games where agents can
take actions of variable duration. Switching between decision taking at every
step and on-demand-decision is one button click away. You can learn more about
the on-demand-decision feature
[here](Learning-Environment-Design-Agents.md#on-demand-decision-making).
* **Memory-enhanced Agents** - In some scenarios, agents must learn to
remember the past in order to take the
best decision. When an agent only has partial observability of the environment,
keeping track of past observations can help the agent learn. We provide an
implementation of _Long Short-term Memory_
([LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory))
in our trainers that enable the agent to store memories to be used in future
steps. You can learn more about enabling LSTM during training
[here](Feature-Memory.md).
- **Memory-enhanced Agents** - In some scenarios, agents must learn to remember
the past in order to take the best decision. When an agent only has partial
observability of the environment, keeping track of past observations can help
the agent learn. We provide an implementation of _Long Short-term Memory_
([LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory)) in our trainers
that enable the agent to store memories to be used in future steps. You can
learn more about enabling LSTM during training [here](Feature-Memory.md).
- **Monitoring Agent’s Decision Making** - Since communication in ML-Agents is a
two-way street, we provide an agent Monitor class in Unity which can display
aspects of the trained agent, such as the agents perception on how well it is
doing (called **value estimates**) within the Unity environment itself. By
leveraging Unity as a visualization tool and providing these outputs in
real-time, researchers and developers can more easily debug an agent’s
behavior. You can learn more about using the Monitor class
[here](Feature-Monitor.md).
* **Monitoring Agent’s Decision Making** - Since communication in ML-Agents
is a two-way street, we provide an agent Monitor class in Unity which can
display aspects of the trained agent, such as the agents perception on how
well it is doing (called **value estimates**) within the Unity environment
itself. By leveraging Unity as a visualization tool and providing these
outputs in real-time, researchers and developers can more easily debug an
agent’s behavior. You can learn more about using the Monitor class
[here](Feature-Monitor.md).
- **Complex Visual Observations** - Unlike other platforms, where the agent’s
observation might be limited to a single vector or image, the ML-Agents
toolkit allows multiple cameras to be used for observations per agent. This
enables agents to learn to integrate information from multiple visual streams.
This can be helpful in several scenarios such as training a self-driving car
which requires multiple cameras with different viewpoints, or a navigational
agent which might need to integrate aerial and first-person visuals. You can
learn more about adding visual observations to an agent
[here](Learning-Environment-Design-Agents.md#multiple-visual-observations).
* **Complex Visual Observations** - Unlike other platforms, where the agent’s
observation might be limited to a single vector or image, the ML-Agents toolkit allows
multiple cameras to be used for observations per agent. This enables agents to
learn to integrate information from multiple visual streams. This can be
helpful in several scenarios such as training a self-driving car which requires
multiple cameras with different viewpoints, or a navigational agent which might
need to integrate aerial and first-person visuals. You can learn more about
adding visual observations to an agent
[here](Learning-Environment-Design-Agents.md#multiple-visual-observations).
* **Broadcasting** - As discussed earlier, an External Brain sends the
observations for all its Agents to the Python API by default. This is helpful
for training or inference. Broadcasting is a feature which can be enabled
for the other three modes (Player, Internal, Heuristic) where the Agent
observations and actions are also sent to the Python API (despite the fact
that the Agent is **not** controlled by the Python API). This feature is
leveraged by Imitation Learning, where the observations and actions for a
Player Brain are used to learn the policies of an agent through demonstration.
However, this could also be helpful for the Heuristic and Internal Brains,
particularly when debugging agent behaviors. You can learn more about using
the broadcasting feature
[here](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).
- **Broadcasting** - As discussed earlier, an External Brain sends the
observations for all its Agents to the Python API by default. This is helpful
for training or inference. Broadcasting is a feature which can be enabled for
the other three modes (Player, Internal, Heuristic) where the Agent
observations and actions are also sent to the Python API (despite the fact
that the Agent is **not** controlled by the Python API). This feature is
leveraged by Imitation Learning, where the observations and actions for a
Player Brain are used to learn the policies of an agent through demonstration.
However, this could also be helpful for the Heuristic and Internal Brains,
particularly when debugging agent behaviors. You can learn more about using
the broadcasting feature
[here](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).
* **Docker Set-up (Experimental)** - To facilitate setting up ML-Agents
without installing Python or TensorFlow directly, we provide a
[guide](Using-Docker.md) on how
to create and run a Docker container.
- **Docker Set-up (Experimental)** - To facilitate setting up ML-Agents without
installing Python or TensorFlow directly, we provide a
[guide](Using-Docker.md) on how to create and run a Docker container.
* **Cloud Training on AWS** - To facilitate using the ML-Agents toolkit on
Amazon Web Services (AWS) machines, we provide a
[guide](Training-on-Amazon-Web-Service.md)
on how to set-up EC2 instances in addition to a public pre-configured Amazon
Machine Image (AMI).
- **Cloud Training on AWS** - To facilitate using the ML-Agents toolkit on
Amazon Web Services (AWS) machines, we provide a
[guide](Training-on-Amazon-Web-Service.md) on how to set-up EC2 instances in
addition to a public pre-configured Amazon Machine Image (AMI).
* **Cloud Training on Microsoft Azure** - To facilitate using the ML-Agents toolkit on
Azure machines, we provide a
[guide](Training-on-Microsoft-Azure.md)
on how to set-up virtual machine instances in addition to a pre-configured data science image.
- **Cloud Training on Microsoft Azure** - To facilitate using the ML-Agents
toolkit on Azure machines, we provide a
[guide](Training-on-Microsoft-Azure.md) on how to set-up virtual machine
instances in addition to a pre-configured data science image.
To briefly summarize: The ML-Agents toolkit enables games and simulations built in Unity
to serve as the platform for training intelligent agents. It is designed
to enable a large variety of training modes and scenarios and comes packed
with several features to enable researchers and developers to leverage
To briefly summarize: The ML-Agents toolkit enables games and simulations built
in Unity to serve as the platform for training intelligent agents. It is
designed to enable a large variety of training modes and scenarios and comes
packed with several features to enable researchers and developers to leverage
To help you use ML-Agents, we've created several in-depth tutorials
for [installing ML-Agents](Installation.md),
[getting started](Getting-Started-with-Balance-Ball.md)
with the 3D Balance Ball environment (one of our many
[sample environments](Learning-Environment-Examples.md)) and
To help you use ML-Agents, we've created several in-depth tutorials for
[installing ML-Agents](Installation.md),
[getting started](Getting-Started-with-Balance-Ball.md) with the 3D Balance Ball
environment (one of our many
[sample environments](Learning-Environment-Examples.md)) and

135
docs/Python-API.md


# Python API
The ML-Agents toolkit provides a Python API for controlling the agent simulation loop of a environment or game built with Unity. This API is used by the ML-Agent training algorithms (run with `learn.py`), but you can also write your Python programs using this API.
The ML-Agents toolkit provides a Python API for controlling the agent simulation
loop of a environment or game built with Unity. This API is used by the ML-Agent
training algorithms (run with `learn.py`), but you can also write your Python
programs using this API.
* **UnityEnvironment** — the main interface between the Unity application and your code. Use UnityEnvironment to start and control a simulation or training session.
* **BrainInfo** — contains all the data from agents in the simulation, such as observations and rewards.
* **BrainParameters** — describes the data elements in a BrainInfo object. For example, provides the array length of an observation in BrainInfo.
- **UnityEnvironment** — the main interface between the Unity application and
your code. Use UnityEnvironment to start and control a simulation or training
session.
- **BrainInfo** — contains all the data from agents in the simulation, such as
observations and rewards.
- **BrainParameters** — describes the data elements in a BrainInfo object. For
example, provides the array length of an observation in BrainInfo.
These classes are all defined in the `python/unityagents` folder of the ML-Agents SDK.
These classes are all defined in the `mlagents/envs` folder of the ML-Agents SDK.
To communicate with an agent in a Unity environment from a Python program, the agent must either use an **External** brain or use a brain that is broadcasting (has its **Broadcast** property set to true). Your code is expected to return actions for agents with external brains, but can only observe broadcasting brains (the information you receive for an agent is the same in both cases). See [Using the Broadcast Feature](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).
To communicate with an agent in a Unity environment from a Python program, the
agent must either use an **External** brain or use a brain that is broadcasting
(has its **Broadcast** property set to true). Your code is expected to return
actions for agents with external brains, but can only observe broadcasting
brains (the information you receive for an agent is the same in both cases). See
[Using the Broadcast
Feature](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).
For a simple example of using the Python API to interact with a Unity environment, see the Basic [Jupyter](Background-Jupyter.md) notebook (`python/notebooks/getting-started.ipynb`), which opens an environment, runs a few simulation steps taking random actions, and closes the environment.
For a simple example of using the Python API to interact with a Unity
environment, see the Basic [Jupyter](Background-Jupyter.md) notebook
(`notebooks/getting-started.ipynb`), which opens an environment, runs a few
simulation steps taking random actions, and closes the environment.
_Notice: Currently communication between Unity and Python takes place over an open socket without authentication. As such, please make sure that the network where training takes place is secure. This will be addressed in a future release._
_Notice: Currently communication between Unity and Python takes place over an
open socket without authentication. As such, please make sure that the network
where training takes place is secure. This will be addressed in a future
release._
Python-side communication happens through `UnityEnvironment` which is located in `python/unityagents`. To load a Unity environment from a built binary file, put the file in the same directory as `unityagents`. For example, if the filename of your Unity environment is 3DBall.app, in python, run:
Python-side communication happens through `UnityEnvironment` which is located in
`mlagents/envs`. To load a Unity environment from a built binary file, put the
file in the same directory as `envs`. For example, if the filename of your Unity
environment is 3DBall.app, in python, run:
```python
from unityagents import UnityEnvironment

* `file_name` is the name of the environment binary (located in the root directory of the python project).
* `worker_id` indicates which port to use for communication with the environment. For use in parallel training regimes such as A3C.
* `seed` indicates the seed to use when generating random numbers during the training process. In environments which do not involve physics calculations, setting the seed enables reproducible experimentation by ensuring that the environment and trainers utilize the same random seed.
- `file_name` is the name of the environment binary (located in the root
directory of the python project).
- `worker_id` indicates which port to use for communication with the
environment. For use in parallel training regimes such as A3C.
- `seed` indicates the seed to use when generating random numbers during the
training process. In environments which do not involve physics calculations,
setting the seed enables reproducible experimentation by ensuring that the
environment and trainers utilize the same random seed.
If you want to directly interact with the Editor, you need to use `file_name=None`, then press the :arrow_forward: button in the Editor when the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen
If you want to directly interact with the Editor, you need to use
`file_name=None`, then press the :arrow_forward: button in the Editor when the
message _"Start training by pressing the Play button in the Unity Editor"_ is
displayed on the screen
* **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of the list corresponds to the n<sup>th</sup> observation of the brain.
* **`vector_observations`** : A two dimensional numpy array of dimension `(batch size, vector observation size)`.
* **`text_observations`** : A list of string corresponding to the agents text observations.
* **`memories`** : A two dimensional numpy array of dimension `(batch size, memory size)` which corresponds to the memories sent at the previous step.
* **`rewards`** : A list as long as the number of agents using the brain containing the rewards they each obtained at the previous step.
* **`local_done`** : A list as long as the number of agents using the brain containing `done` flags (whether or not the agent is done).
* **`max_reached`** : A list as long as the number of agents using the brain containing true if the agents reached their max steps.
* **`agents`** : A list of the unique ids of the agents using the brain.
* **`previous_actions`** : A two dimensional numpy array of dimension `(batch size, vector action size)` if the vector action space is continuous and `(batch size, number of branches)` if the vector action space is discrete.
- **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of
the list corresponds to the n<sup>th</sup> observation of the brain.
- **`vector_observations`** : A two dimensional numpy array of dimension `(batch
size, vector observation size)`.
- **`text_observations`** : A list of string corresponding to the agents text
observations.
- **`memories`** : A two dimensional numpy array of dimension `(batch size,
memory size)` which corresponds to the memories sent at the previous step.
- **`rewards`** : A list as long as the number of agents using the brain
containing the rewards they each obtained at the previous step.
- **`local_done`** : A list as long as the number of agents using the brain
containing `done` flags (whether or not the agent is done).
- **`max_reached`** : A list as long as the number of agents using the brain
containing true if the agents reached their max steps.
- **`agents`** : A list of the unique ids of the agents using the brain.
- **`previous_actions`** : A two dimensional numpy array of dimension `(batch
size, vector action size)` if the vector action space is continuous and
`(batch size, number of branches)` if the vector action space is discrete.
Once loaded, you can use your UnityEnvironment object, which referenced by a variable named `env` in this example, can be used in the following way:
Once loaded, you can use your UnityEnvironment object, which referenced by a
variable named `env` in this example, can be used in the following way:
Prints all parameters relevant to the loaded environment and the external brains.
Prints all parameters relevant to the loaded environment and the external
brains.
Send a reset signal to the environment, and provides a dictionary mapping brain names to BrainInfo objects.
- `train_model` indicates whether to run the environment in train (`True`) or test (`False`) mode.
- `config` is an optional dictionary of configuration flags specific to the environment. For generic environments, `config` can be ignored. `config` is a dictionary of strings to floats where the keys are the names of the `resetParameters` and the values are their corresponding float values. Define the reset parameters on the [Academy Inspector](Learning-Environment-Design-Academy.md#academy-properties) window in the Unity Editor.
Send a reset signal to the environment, and provides a dictionary mapping
brain names to BrainInfo objects.
- `train_model` indicates whether to run the environment in train (`True`) or
test (`False`) mode.
- `config` is an optional dictionary of configuration flags specific to the
environment. For generic environments, `config` can be ignored. `config` is
a dictionary of strings to floats where the keys are the names of the
`resetParameters` and the values are their corresponding float values.
Define the reset parameters on the [Academy
Inspector](Learning-Environment-Design-Academy.md#academy-properties) window
in the Unity Editor.
Sends a step signal to the environment using the actions. For each brain :
- `action` can be one dimensional arrays or two dimensional arrays if you have multiple agents per brains.
- `memory` is an optional input that can be used to send a list of floats per agents to be retrieved at the next step.
- `text_action` is an optional input that be used to send a single string per agent.
Sends a step signal to the environment using the actions. For each brain :
- `action` can be one dimensional arrays or two dimensional arrays if you have
multiple agents per brains.
- `memory` is an optional input that can be used to send a list of floats per
agents to be retrieved at the next step.
- `text_action` is an optional input that be used to send a single string per
agent.
For example, to access the BrainInfo belonging to a brain called 'brain_name', and the BrainInfo field 'vector_observations':
For example, to access the BrainInfo belonging to a brain called
'brain_name', and the BrainInfo field 'vector_observations':
```
```
Note that if you have more than one external brain in the environment, you must provide dictionaries from brain names to arrays for `action`, `memory` and `value`. For example: If you have two external brains named `brain1` and `brain2` each with one agent taking two continuous actions, then you can have:
Note that if you have more than one external brain in the environment, you
must provide dictionaries from brain names to arrays for `action`, `memory`
and `value`. For example: If you have two external brains named `brain1` and
`brain2` each with one agent taking two continuous actions, then you can
have:
Returns a dictionary mapping brain names to BrainInfo objects.
- **Close : `env.close()`**
Sends a shutdown signal to the environment and closes the communication socket.
Returns a dictionary mapping brain names to BrainInfo objects.
- **Close : `env.close()`**
Sends a shutdown signal to the environment and closes the communication
socket.

81
docs/Readme.md


# Unity ML-Agents Toolkit Documentation
## Installation & Set-up
* [Installation](Installation.md)
* [Background: Jupyter Notebooks](Background-Jupyter.md)
* [Docker Set-up](Using-Docker.md)
* [Basic Guide](Basic-Guide.md)
* [Installation](Installation.md)
* [Background: Jupyter Notebooks](Background-Jupyter.md)
* [Docker Set-up](Using-Docker.md)
* [Basic Guide](Basic-Guide.md)
* [ML-Agents Toolkit Overview](ML-Agents-Overview.md)
* [Background: Unity](Background-Unity.md)
* [Background: Machine Learning](Background-Machine-Learning.md)
* [Background: TensorFlow](Background-TensorFlow.md)
* [Getting Started with the 3D Balance Ball Environment](Getting-Started-with-Balance-Ball.md)
* [Example Environments](Learning-Environment-Examples.md)
* [ML-Agents Toolkit Overview](ML-Agents-Overview.md)
* [Background: Unity](Background-Unity.md)
* [Background: Machine Learning](Background-Machine-Learning.md)
* [Background: TensorFlow](Background-TensorFlow.md)
* [Getting Started with the 3D Balance Ball Environment](Getting-Started-with-Balance-Ball.md)
* [Example Environments](Learning-Environment-Examples.md)
* [Making a New Learning Environment](Learning-Environment-Create-New.md)
* [Designing a Learning Environment](Learning-Environment-Design.md)
* [Agents](Learning-Environment-Design-Agents.md)
* [Academy](Learning-Environment-Design-Academy.md)
* [Brains](Learning-Environment-Design-Brains.md): [Player](Learning-Environment-Design-Player-Brains.md), [Heuristic](Learning-Environment-Design-Heuristic-Brains.md), [Internal & External](Learning-Environment-Design-External-Internal-Brains.md)
* [Learning Environment Best Practices](Learning-Environment-Best-Practices.md)
* [Using the Monitor](Feature-Monitor.md)
* [Using an Executable Environment](Learning-Environment-Executable.md)
* [TensorFlowSharp in Unity (Experimental)](Using-TensorFlow-Sharp-in-Unity.md)
* [Making a New Learning Environment](Learning-Environment-Create-New.md)
* [Designing a Learning Environment](Learning-Environment-Design.md)
* [Agents](Learning-Environment-Design-Agents.md)
* [Academy](Learning-Environment-Design-Academy.md)
* [Brains](Learning-Environment-Design-Brains.md):
[Player](Learning-Environment-Design-Player-Brains.md),
[Heuristic](Learning-Environment-Design-Heuristic-Brains.md),
[Internal & External](Learning-Environment-Design-External-Internal-Brains.md)
* [Learning Environment Best Practices](Learning-Environment-Best-Practices.md)
* [Using the Monitor](Feature-Monitor.md)
* [Using an Executable Environment](Learning-Environment-Executable.md)
* [TensorFlowSharp in Unity (Experimental)](Using-TensorFlow-Sharp-in-Unity.md)
* [Training ML-Agents](Training-ML-Agents.md)
* [Training with Proximal Policy Optimization](Training-PPO.md)
* [Training with Curriculum Learning](Training-Curriculum-Learning.md)
* [Training with Imitation Learning](Training-Imitation-Learning.md)
* [Training with LSTM](Feature-Memory.md)
* [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md)
* [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md)
* [Using TensorBoard to Observe Training](Using-Tensorboard.md)
* [Training ML-Agents](Training-ML-Agents.md)
* [Training with Proximal Policy Optimization](Training-PPO.md)
* [Training with Curriculum Learning](Training-Curriculum-Learning.md)
* [Training with Imitation Learning](Training-Imitation-Learning.md)
* [Training with LSTM](Feature-Memory.md)
* [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md)
* [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md)
* [Using TensorBoard to Observe Training](Using-Tensorboard.md)
* [Migrating from earlier versions of ML-Agents](Migrating.md)
* [Frequently Asked Questions](FAQ.md)
* [ML-Agents Glossary](Glossary.md)
* [Limitations](Limitations.md)
* [Migrating from earlier versions of ML-Agents](Migrating.md)
* [Frequently Asked Questions](FAQ.md)
* [ML-Agents Glossary](Glossary.md)
* [Limitations](Limitations.md)
* [API Reference](API-Reference.md)
* [How to use the Python API](Python-API.md)
* [Wrapping Learning Environment as a Gym](../gym-unity/Readme.md)
* [API Reference](API-Reference.md)
* [How to use the Python API](Python-API.md)
* [Wrapping Learning Environment as a Gym](../gym-unity/Readme.md)

8
docs/Training-Curriculum-Learning.md


### Specifying a Metacurriculum
We first create a folder inside `python/curricula/` for the environment we want
We first create a folder inside `curricula/` for the environment we want
`python/curricula/wall-jump/`. We will place our curriculums inside this folder.
`curricula/wall-jump/`. We will place our curriculums inside this folder.
### Specifying a Curriculum

Once our curriculum is defined, we have to use the reset parameters we defined
and modify the environment from the agent's `AgentReset()` function. See
[WallJumpAgent.cs](https://github.com/Unity-Technologies/ml-agents/blob/master/unity-environment/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs)
[WallJumpAgent.cs](https://github.com/Unity-Technologies/ml-agents/blob/master/MLAgentsSDK/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs)
for an example. Note that if the Academy's __Max Steps__ is not set to some
positive number the environment will never be reset. The Academy must reset
for the environment to reset.

brains---BigWallBrain and SmallWallBrain. If we want to define a curriculum for
the BigWallBrain, we will save `BigWallBrain.json` into
`python/curricula/wall-jump/`.
`curricula/wall-jump/`.
### Training with a Curriculum

2
docs/Training-Imitation-Learning.md


3. Set the "Student" brain to External mode.
4. Link the brains to the desired agents (one agent as the teacher and at least one agent as a student).
5. In `trainer_config.yaml`, add an entry for the "Student" brain. Set the `trainer` parameter of this entry to `imitation`, and the `brain_to_imitate` parameter to the name of the teacher brain: "Teacher". Additionally, set `batches_per_epoch`, which controls how much training to do each moment. Increase the `max_steps` option if you'd like to keep training the agents for a longer period of time.
6. Launch the training process with `python3 python/learn.py --train --slow`, and press the :arrow_forward: button in Unity when the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen
6. Launch the training process with `learn --train --slow`, and press the :arrow_forward: button in Unity when the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen
7. From the Unity window, control the agent with the Teacher brain by providing "teacher demonstrations" of the behavior you would like to see.
8. Watch as the agent(s) with the student brain attached begin to behave similarly to the demonstrations.
9. Once the Student agents are exhibiting the desired behavior, end the training process with `CTL+C` from the command line.

6
docs/Training-ML-Agents.md


And then opening the URL: [localhost:6006](http://localhost:6006).
When training is finished, you can find the saved model in the `python/models` folder under the assigned run-id — in the cats example, the path to the model would be `python/models/cob_1/CatsOnBicycles_cob_1.bytes`.
When training is finished, you can find the saved model in the `models` folder
under the assigned run-id — in the cats example, the path to the model would be
`models/cob_1/CatsOnBicycles_cob_1.bytes`.
While this example used the default training hyperparameters, you can edit the [training_config.yaml file](#training-config-file) with a text editor to set different values.

* `--curriculum=<file>` – Specify a curriculum JSON file for defining the lessons for curriculum training. See [Curriculum Training](Training-Curriculum-Learning.md) for more information.
* `--keep-checkpoints=<n>` – Specify the maximum number of model checkpoints to keep. Checkpoints are saved after the number of steps specified by the `save-freq` option. Once the maximum number of checkpoints has been reached, the oldest checkpoint is deleted when saving a new checkpoint. Defaults to 5.
* `--lesson=<n>` – Specify which lesson to start with when performing curriculum training. Defaults to 0.
* `--load` – If set, the training code loads an already trained model to initialize the neural network before training. The learning code looks for the model in `python/models/<run-id>/` (which is also where it saves models at the end of training). When not set (the default), the neural network weights are randomly initialized and an existing model is not loaded.
* `--load` – If set, the training code loads an already trained model to initialize the neural network before training. The learning code looks for the model in `models/<run-id>/` (which is also where it saves models at the end of training). When not set (the default), the neural network weights are randomly initialized and an existing model is not loaded.
* `--num-runs=<n>` - Sets the number of concurrent training sessions to perform. Default is set to 1. Set to higher values when benchmarking performance and multiple training sessions is desired. Training sessions are independent, and do not improve learning performance.
* `--run-id=<path>` – Specifies an identifier for each training run. This identifier is used to name the subdirectories in which the trained model and summary statistics are saved as well as the saved model itself. The default id is "ppo". If you use TensorBoard to view the training statistics, always set a unique run-id for each training run. (The statistics for all runs with the same id are combined as if they were produced by a the same session.)
* `--save-freq=<n>` Specifies how often (in steps) to save the model during training. Defaults to 50000.

8
docs/dox-ml-agents.conf


# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
# Note: If this tag is empty the current directory is searched.
INPUT = ../unity-environment/Assets/ML-Agents/Scripts/Academy.cs \
../unity-environment/Assets/ML-Agents/Scripts/Agent.cs \
../unity-environment/Assets/ML-Agents/Scripts/Monitor.cs \
../unity-environment/Assets/ML-Agents/Scripts/Decision.cs
INPUT = ../MLAgentsSDK/Assets/ML-Agents/Scripts/Academy.cs \
../MLAgentsSDK/Assets/ML-Agents/Scripts/Agent.cs \
../MLAgentsSDK/Assets/ML-Agents/Scripts/Monitor.cs \
../MLAgentsSDK/Assets/ML-Agents/Scripts/Decision.cs
# This tag can be used to specify the character encoding of the source files
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses

16
gym-unity/Readme.md


The returned environment `env` will function as a gym.
For more on using the gym interface, see our [Jupyter Notebook tutorial](../python/notebooks/getting-started-gym.ipynb).
For more on using the gym interface, see our [Jupyter Notebook tutorial](../notebooks/getting-started-gym.ipynb).
* It is only possible to use an environment with a single Brain.
* By default the first visual observation is provided as the `observation`, if present. Otherwise vector observations are provided.
* All `BrainInfo` output from the environment can still be accessed from the `info` provided by `env.step(action)`.
* Stacked vector observations are not supported.
* Environment registration for use with `gym.make()` is currently not supported.
* It is only possible to use an environment with a single Brain.
* By default the first visual observation is provided as the `observation`, if
present. Otherwise vector observations are provided.
* All `BrainInfo` output from the environment can still be accessed from the
`info` provided by `env.step(action)`.
* Stacked vector observations are not supported.
* Environment registration for use with `gym.make()` is currently not supported.
## Running OpenAI Baselines Algorithms

9
ml-agents-protobuf/README.md


1. Install pre-requisites.
2. Un-comment line 4 in `make.bat`, and set to correct Grpc.Tools sub-directory.
3. Run `make.bat`
4. Copy created `communicator_objects` and `CommunicatorObjects` folders to their respective sub-directories within the `ml-agents` repository.
* For Python, the generated files should be copied to: `python/communicator_objects`
* For C#, the generated files should be copied to: `unity-environment/Assets/ML-Agents/Scripts/CommunicatorObjects`.
4. Copy created `communicator_objects` and `CommunicatorObjects` folders to
their respective sub-directories within the `ml-agents` repository.
* For Python, the generated files should be copied to:
`mlagents/envs/communicator_objects`
* For C#, the generated files should be copied to:
`MLAgentsSDK/Assets/ML-Agents/Scripts/CommunicatorObjects`.
正在加载...
取消
保存