This guide will show you how to use a pretrained model in an example Unity environment, and show you how to train the model yourself.
This guide will show you how to use a pretrained model in an example Unity
environment, and show you how to train the model yourself.
If you are not familiar with the [Unity Engine](https://unity3d.com/unity),
we highly recommend the [Roll-a-ball tutorial](https://unity3d.com/learn/tutorials/s/roll-ball-tutorial) to learn all the basic concepts of Unity.
If you are not familiar with the [Unity Engine](https://unity3d.com/unity), we
highly recommend the [Roll-a-ball
tutorial](https://unity3d.com/learn/tutorials/s/roll-ball-tutorial) to learn all
the basic concepts of Unity.
In order to use the ML-Agents toolkit within Unity, you need to change some Unity settings first. Also [TensorFlowSharp plugin](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage) is needed for you to use pretrained model within Unity, which is based on the [TensorFlowSharp repo](https://github.com/migueldeicaza/TensorFlowSharp).
In order to use the ML-Agents toolkit within Unity, you need to change some
3. Using the file dialog that opens, locate the `unity-environment` folder within the the ML-Agents toolkit project and click **Open**.
3. Using the file dialog that opens, locate the `MLAgentsSDK` folder
within the the ML-Agents toolkit project and click **Open**.
5. For **each** of the platforms you target
(**PC, Mac and Linux Standalone**, **iOS** or **Android**):
5. For **each** of the platforms you target (**PC, Mac and Linux Standalone**,
**iOS** or **Android**):
2. Select **Scripting Runtime Version** to
**Experimental (.NET 4.6 Equivalent or .NET 4.x Equivalent)**
3. In **Scripting Defined Symbols**, add the flag `ENABLE_TENSORFLOW`.
After typing in the flag name, press Enter.
2. Select **Scripting Runtime Version** to **Experimental (.NET 4.6
Equivalent or .NET 4.x Equivalent)**
3. In **Scripting Defined Symbols**, add the flag `ENABLE_TENSORFLOW`. After
typing in the flag name, press Enter.
[Download](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage) the TensorFlowSharp plugin. Then import it into Unity by double clicking the downloaded file. You can check if it was successfully imported by checking the TensorFlow files in the Project window under **Assets** > **ML-Agents** > **Plugins** > **Computer**.
the TensorFlowSharp plugin. Then import it into Unity by double clicking the
downloaded file. You can check if it was successfully imported by checking the
TensorFlow files in the Project window under **Assets** > **ML-Agents** >
**Plugins** > **Computer**.
**Note**: If you don't see anything under **Assets**, drag the `ml-agents/unity-environment/Assets/ML-Agents` folder under **Assets** within Project window.
**Note**: If you don't see anything under **Assets**, drag the
`ml-agents/MLAgentsSDK/Assets/ML-Agents` folder under **Assets** within
Project window.
1. In the **Project** window, go to `Assets/ML-Agents/Examples/3DBall` folder and open the `3DBall` scene file.
2. In the **Hierarchy** window, select the **Ball3DBrain** child under the **Ball3DAcademy** GameObject to view its properties in the Inspector window.
3. On the **Ball3DBrain** object's **Brain** component, change the **Brain Type** to **Internal**.
4. In the **Project** window, locate the `Assets/ML-Agents/Examples/3DBall/TFModels` folder.
5. Drag the `3DBall` model file from the `TFModels` folder to the **Graph Model** field of the **Ball3DBrain** object's **Brain** component.
5. Click the **Play** button and you will see the platforms balance the balls using the pretrained model.
1. In the **Project** window, go to `Assets/ML-Agents/Examples/3DBall` folder
and open the `3DBall` scene file.
2. In the **Hierarchy** window, select the **Ball3DBrain** child under the
**Ball3DAcademy** GameObject to view its properties in the Inspector window.
3. On the **Ball3DBrain** object's **Brain** component, change the **Brain
5. Drag the `3DBall` model file from the `TFModels` folder to the **Graph
Model** field of the **Ball3DBrain** object's **Brain** component.
6. Click the **Play** button and you will see the platforms balance the balls
using the pretrained model.
The `python/Basics` [Jupyter notebook](Background-Jupyter.md) contains a
simple walkthrough of the functionality of the Python
API. It can also serve as a simple test that your environment is configured
correctly. Within `Basics`, be sure to set `env_name` to the name of the
Unity executable if you want to [use an executable](Learning-Environment-Executable.md) or to `None` if you want to interact with the current scene in the Unity Editor.
The `notebooks/getting-started.ipynb` [Jupyter notebook](Background-Jupyter.md)
contains a simple walkthrough of the functionality of the Python API. It can
also serve as a simple test that your environment is configured correctly.
Within `Basics`, be sure to set `env_name` to the name of the Unity executable
if you want to [use an executable](Learning-Environment-Executable.md) or to
`None` if you want to interact with the current scene in the Unity Editor.
Since we are going to build this environment to conduct training, we need to
set the brain used by the agents to **External**. This allows the agents to
Since we are going to build this environment to conduct training, we need to set
the brain used by the agents to **External**. This allows the agents to
1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy
object.
1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy
object.
2. Select its child object **Ball3DBrain**.
3. In the Inspector window, set **Brain Type** to **External**.
1. Open a command or terminal window.
2. Nagivate to the folder where you installed the ML-Agents toolkit.
3. Change to the `python` directory.
4. Run `python3 learn.py --run-id=<run-identifier> --train`
Where:
- `<run-identifier>` is a string used to separate the results of different training runs
- And the `--train` tells learn.py to run a training session (rather than inference)
5. When the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen, you can press the :arrow_forward: button in Unity to start training in the Editor.
**Note**: Alternatively, you can use an executable rather than the Editor to perform training. Please refer to [this page](Learning-Environment-Executable.md) for instructions on how to build and use an executable.
1. Open a command or terminal window.
2. Nagivate to the folder where you installed the ML-Agents toolkit.
3. Run `learn <trainer-config-path> --run-id=<run-identifier> --train` Where:
- `<trainer-config-path>` is the relative or absolute filepath of the
trainer configuration. The defaults used by environments in the ML-Agents
SDK can be found in `trainer_config.yaml`.
- `<run-identifier>` is a string used to separate the results of different
training runs
- And the `--train` tells learn.py to run a training session (rather than
inference)
5. When the message _"Start training by pressing the Play button in the Unity
Editor"_ is displayed on the screen, you can press the :arrow_forward: button
in Unity to start training in the Editor.
**Note**: Alternatively, you can use an executable rather than the Editor to
perform training. Please refer to [this
page](Learning-Environment-Executable.md) for instructions on how to build and
use an executable.
**Note**: If you're using Anaconda, don't forget to activate the ml-agents environment first.
**Note**: If you're using Anaconda, don't forget to activate the ml-agents
environment first.
If the learn.py runs correctly and starts training, you should see something like this:
If the learn.py runs correctly and starts training, you should see something
like this:
You can press Ctrl+C to stop the training, and your trained model will be at `ml-agents/python/models/<run-identifier>/editor_<academy_name>_<run-identifier>.bytes` where `<academy_name>` is the name of the Academy GameObject in the current scene. This file corresponds to your model's latest checkpoint. You can now embed this trained model into your internal brain by following the steps below, which is similar to the steps described [above](#play-an-example-environment-using-pretrained-model).
5. Drag the `<env_name>_<run-identifier>.bytes` file from the Project window of the Editor
to the **Graph Model** placeholder in the **Ball3DBrain** inspector window.
5. Drag the `<env_name>_<run-identifier>.bytes` file from the Project window of
the Editor to the **Graph Model** placeholder in the **Ball3DBrain**
inspector window.
* For more information on the ML-Agents toolkit, in addition to helpful background, check out the [ML-Agents Toolkit Overview](ML-Agents-Overview.md) page.
* For a more detailed walk-through of our 3D Balance Ball environment, check out the [Getting Started](Getting-Started-with-Balance-Ball.md) page.
* For a "Hello World" introduction to creating your own learning environment, check out the [Making a New Learning Environment](Learning-Environment-Create-New.md) page.
* For a series of Youtube video tutorials, checkout the [Machine Learning Agents PlayList](https://www.youtube.com/playlist?list=PLX2vGYjWbI0R08eWQkO7nQkGiicHAX7IX) page.
- For more information on the ML-Agents toolkit, in addition to helpful
background, check out the [ML-Agents Toolkit Overview](ML-Agents-Overview.md)
page.
- For a more detailed walk-through of our 3D Balance Ball environment, check out
the [Getting Started](Getting-Started-with-Balance-Ball.md) page.
- For a "Hello World" introduction to creating your own learning environment,
### Scripting Runtime Environment not setup correctly
## Scripting Runtime Environment not setup correctly
If you haven't switched your scripting runtime version from .NET 3.5 to .NET 4.6 or .NET 4.x, you will see such error message:
If you haven't switched your scripting runtime version from .NET 3.5 to .NET 4.6
or .NET 4.x, you will see such error message:
This is because .NET 3.5 doesn't support method Clear() for StringBuilder, refer to [Setting Up The ML-Agents Toolkit Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
This is because .NET 3.5 doesn't support method Clear() for StringBuilder, refer
to [Setting Up The ML-Agents Toolkit Within
Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
### TensorFlowSharp flag not turned on.
## TensorFlowSharp flag not turned on
If you have already imported the TensorFlowSharp plugin, but havn't set ENABLE_TENSORFLOW flag for your scripting define symbols, you will see the following error message:
If you have already imported the TensorFlowSharp plugin, but havn't set
ENABLE_TENSORFLOW flag for your scripting define symbols, you will see the
following error message:
You need to install and enable the TensorFlowSharp plugin in order to use the internal brain.
You need to install and enable the TensorFlowSharp plugin in order to use the internal brain.
This error message occurs because the TensorFlowSharp plugin won't be usage without the ENABLE_TENSORFLOW flag, refer to [Setting Up The ML-Agents Toolkit Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
This error message occurs because the TensorFlowSharp plugin won't be usage
without the ENABLE_TENSORFLOW flag, refer to [Setting Up The ML-Agents Toolkit
Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
### Tensorflow epsilon placeholder error
## Tensorflow epsilon placeholder error
If you have a graph placeholder set in the internal Brain inspector that is not present in the TensorFlow graph, you will see some error like this:
If you have a graph placeholder set in the internal Brain inspector that is not
present in the TensorFlow graph, you will see some error like this:
UnityAgentsException: One of the Tensorflow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>.
UnityAgentsException: One of the Tensorflow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>.
Solution: Go to all of your Brain object, find `Graph placeholders` and change its `size` to 0 to remove the `epsilon` placeholder.
Solution: Go to all of your Brain object, find `Graph placeholders` and change
its `size` to 0 to remove the `epsilon` placeholder.
Similarly, if you have a graph scope set in the internal Brain inspector that is not correctly set, you will see some error like this:
Similarly, if you have a graph scope set in the internal Brain inspector that is
not correctly set, you will see some error like this:
Solution: Make sure your Graph Scope field matches the corresponding brain object name in your Hierachy Inspector when there is multiple brain.
Solution: Make sure your Graph Scope field matches the corresponding brain
object name in your Hierachy Inspector when there is multiple brain.
### Environment Permission Error
## Environment Permission Error
If you directly import your Unity environment without building it in the
editor, you might need to give it additional permissions to execute it.
If you directly import your Unity environment without building it in the
editor, you might need to give it additional permissions to execute it.
`chmod -R 755 *.app`
```shell
chmod -R 755 *.app
```
`chmod -R 755 *.x86_64`
```shell
chmod -R 755 *.x86_64
```
On Windows, you can find
On Windows, you can find
### Environment Connection Timeout
## Environment Connection Timeout
If you are able to launch the environment from `UnityEnvironment` but
then receive a timeout error, there may be a number of possible causes.
* _Cause_: There may be no Brains in your environment which are set
to `External`. In this case, the environment will not attempt to
communicate with python. _Solution_: Set the Brains(s) you wish to
externally control through the Python API to `External` from the
Unity Editor, and rebuild the environment.
* _Cause_: On OSX, the firewall may be preventing communication with
the environment. _Solution_: Add the built environment binary to the
# Getting Started with the 3D Balance Ball Environment
This tutorial walks through the end-to-end process of opening a ML-Agents toolkit
example environment in Unity, building the Unity executable, training an agent
in it, and finally embedding the trained model into the Unity environment.
This tutorial walks through the end-to-end process of opening a ML-Agents
toolkit example environment in Unity, building the Unity executable, training an
agent in it, and finally embedding the trained model into the Unity environment.
The ML-Agents toolkit includes a number of [example environments](Learning-Environment-Examples.md)
which you can examine to help understand the different ways in which the ML-Agents toolkit
can be used. These environments can also serve as templates for new
environments or as ways to test new ML algorithms. After reading this tutorial,
you should be able to explore and build the example environments.
The ML-Agents toolkit includes a number of [example
environments](Learning-Environment-Examples.md) which you can examine to help
understand the different ways in which the ML-Agents toolkit can be used. These
environments can also serve as templates for new environments or as ways to test
new ML algorithms. After reading this tutorial, you should be able to explore
and build the example environments.
This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball contains
a number of platforms and balls (which are all copies of each other).
Each platform tries to keep its ball from falling by rotating either
horizontally or vertically. In this environment, a platform is an **agent**
that receives a reward for every step that it balances the ball. An agent is
also penalized with a negative reward for dropping the ball. The goal of the
training process is to have the platforms learn to never drop the ball.
This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball
contains a number of platforms and balls (which are all copies of each other).
Each platform tries to keep its ball from falling by rotating either
horizontally or vertically. In this environment, a platform is an **agent** that
receives a reward for every step that it balances the ball. An agent is also
penalized with a negative reward for dropping the ball. The goal of the training
process is to have the platforms learn to never drop the ball.
In order to install and set up the ML-Agents toolkit, the Python dependencies and Unity,
see the [installation instructions](Installation.md).
In order to install and set up the ML-Agents toolkit, the Python dependencies
and Unity, see the [installation instructions](Installation.md).
An agent is an autonomous actor that observes and interacts with an
_environment_. In the context of Unity, an environment is a scene containing
an Academy and one or more Brain and Agent objects, and, of course, the other
An agent is an autonomous actor that observes and interacts with an
_environment_. In the context of Unity, an environment is a scene containing an
Academy and one or more Brain and Agent objects, and, of course, the other
**Note:** In Unity, the base object of everything in a scene is the
_GameObject_. The GameObject is essentially a container for everything else,
including behaviors, graphics, physics, etc. To see the components that make
up a GameObject, select the GameObject in the Scene window, and open the
Inspector window. The Inspector shows every component on a GameObject.
The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several platforms. Each platform in the scene is an
independent agent, but they all share the same brain. 3D Balance Ball does this
**Note:** In Unity, the base object of everything in a scene is the
_GameObject_. The GameObject is essentially a container for everything else,
including behaviors, graphics, physics, etc. To see the components that make up
a GameObject, select the GameObject in the Scene window, and open the Inspector
window. The Inspector shows every component on a GameObject.
The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several platforms. Each platform in the scene is an
independent agent, but they all share the same brain. 3D Balance Ball does this
The Academy object for the scene is placed on the Ball3DAcademy GameObject.
When you look at an Academy component in the inspector, you can see several
properties that control how the environment works. For example, the
**Training** and **Inference Configuration** properties set the graphics and
timescale properties for the Unity application. The Academy uses the
**Training Configuration** during training and the **Inference Configuration**
when not training. (*Inference* means that the agent is using a trained model
or heuristics or direct control — in other words, whenever **not** training.)
Typically, you set low graphics quality and a high time scale for the
**Training configuration** and a high graphics quality and the timescale to
`1.0` for the **Inference Configuration** .
The Academy object for the scene is placed on the Ball3DAcademy GameObject. When
you look at an Academy component in the inspector, you can see several
properties that control how the environment works. For example, the **Training**
and **Inference Configuration** properties set the graphics and timescale
properties for the Unity application. The Academy uses the **Training
Configuration** during training and the **Inference Configuration** when not
training. (*Inference* means that the agent is using a trained model or
heuristics or direct control — in other words, whenever **not** training.)
Typically, you set low graphics quality and a high time scale for the **Training
configuration** and a high graphics quality and the timescale to `1.0` for the
**Inference Configuration** .
**Note:** if you want to observe the environment during training, you can
adjust the **Inference Configuration** settings to use a larger window and a
timescale closer to 1:1. Be sure to set these parameters back when training in
earnest; otherwise, training can take a very long time.
**Note:** if you want to observe the environment during training, you can adjust
the **Inference Configuration** settings to use a larger window and a timescale
closer to 1:1. Be sure to set these parameters back when training in earnest;
otherwise, training can take a very long time.
Another aspect of an environment to look at is the Academy implementation.
Since the base Academy class is abstract, you must always define a subclass.
There are three functions you can implement, though they are all optional:
Another aspect of an environment to look at is the Academy implementation. Since
the base Academy class is abstract, you must always define a subclass. There are
three functions you can implement, though they are all optional:
* Academy.AcademyStep() — Called at every simulation step before
Agent.AgentAction() (and after the agents collect their observations).
* Academy.AcademyReset() — Called when the Academy starts or restarts the
simulation (including the first time).
* Academy.AcademyStep() — Called at every simulation step before
Agent.AgentAction() (and after the agents collect their observations).
* Academy.AcademyReset() — Called when the Academy starts or restarts the
simulation (including the first time).
The 3D Balance Ball environment does not use these functions — each agent
resets itself when needed — but many environments do use these functions to
control the environment around the agents.
The 3D Balance Ball environment does not use these functions — each agent resets
itself when needed — but many environments do use these functions to control the
environment around the agents.
The Ball3DBrain GameObject in the scene, which contains a Brain component,
is a child of the Academy object. (All Brain objects in a scene must be
children of the Academy.) All the agents in the 3D Balance Ball environment
use the same Brain instance.
A Brain doesn't store any information about an agent,
it just routes the agent's collected observations to the decision making
process and returns the chosen action to the agent. Thus, all agents can share
the same brain, but act independently. The Brain settings tell you quite a bit
about how an agent works.
The Ball3DBrain GameObject in the scene, which contains a Brain component, is a
child of the Academy object. (All Brain objects in a scene must be children of
the Academy.) All the agents in the 3D Balance Ball environment use the same
Brain instance. A Brain doesn't store any information about an agent, it just
routes the agent's collected observations to the decision making process and
returns the chosen action to the agent. Thus, all agents can share the same
brain, but act independently. The Brain settings tell you quite a bit about how
an agent works.
The **Brain Type** determines how an agent makes its decisions. The
**External** and **Internal** types work together — use **External** when
training your agents; use **Internal** when using the trained model.
The **Heuristic** brain allows you to hand-code the agent's logic by extending
the Decision class. Finally, the **Player** brain lets you map keyboard
commands to actions, which can be useful when testing your agents and
environment. If none of these types of brains do what you need, you can
implement your own CoreBrain to create your own type.
The **Brain Type** determines how an agent makes its decisions. The **External**
and **Internal** types work together — use **External** when training your
agents; use **Internal** when using the trained model. The **Heuristic** brain
allows you to hand-code the agent's logic by extending the Decision class.
Finally, the **Player** brain lets you map keyboard commands to actions, which
can be useful when testing your agents and environment. If none of these types
of brains do what you need, you can implement your own CoreBrain to create your
own type.
In this tutorial, you will set the **Brain Type** to **External** for training;
In this tutorial, you will set the **Brain Type** to **External** for training;
**Vector Observation Space**
#### Vector Observation Space
Before making a decision, an agent collects its observation about its state
in the world. The vector observation is a vector of floating point numbers
which contain relevant information for the agent to make decisions.
Before making a decision, an agent collects its observation about its state in
the world. The vector observation is a vector of floating point numbers which
contain relevant information for the agent to make decisions.
The Brain instance used in the 3D Balance Ball example uses the **Continuous**
vector observation space with a **State Size** of 8. This means that the
feature vector containing the agent's observations contains eight elements:
the `x` and `z` components of the platform's rotation and the `x`, `y`, and `z`
components of the ball's relative position and velocity. (The observation
values are defined in the agent's `CollectObservations()` function.)
The Brain instance used in the 3D Balance Ball example uses the **Continuous**
vector observation space with a **State Size** of 8. This means that the feature
vector containing the agent's observations contains eight elements: the `x` and
`z` components of the platform's rotation and the `x`, `y`, and `z` components
of the ball's relative position and velocity. (The observation values are
defined in the agent's `CollectObservations()` function.)
**Vector Action Space**
#### Vector Action Space
An agent is given instructions from the brain in the form of *actions*.
ML-Agents toolkit classifies actions into two types: the **Continuous**
vector action space is a vector of numbers that can vary continuously. What
each element of the vector means is defined by the agent logic (the PPO
training process just learns what values are better given particular state
observations based on the rewards received when it tries different values).
For example, an element might represent a force or torque applied to a
`RigidBody` in the agent. The **Discrete** action vector space defines its
actions as tables. An action given to the agent is an array of indeces into
tables.
An agent is given instructions from the brain in the form of *actions*.
ML-Agents toolkit classifies actions into two types: the **Continuous** vector
action space is a vector of numbers that can vary continuously. What each
element of the vector means is defined by the agent logic (the PPO training
process just learns what values are better given particular state observations
based on the rewards received when it tries different values). For example, an
element might represent a force or torque applied to a `RigidBody` in the agent.
The **Discrete** action vector space defines its actions as tables. An action
given to the agent is an array of indeces into tables.
space.
You can try training with both settings to observe whether there is a
difference. (Set the `Vector Action Space Size` to 4 when using the discrete
space. You can try training with both settings to observe whether there is a
difference. (Set the `Vector Action Space Size` to 4 when using the discrete
The Agent is the actor that observes and takes actions in the environment.
In the 3D Balance Ball environment, the Agent components are placed on the
twelve Platform GameObjects. The base Agent object has a few properties that
affect its behavior:
The Agent is the actor that observes and takes actions in the environment. In
the 3D Balance Ball environment, the Agent components are placed on the twelve
Platform GameObjects. The base Agent object has a few properties that affect its
behavior:
* **Brain** — Every agent must have a Brain. The brain determines how an agent
makes decisions. All the agents in the 3D Balance Ball scene share the same
brain.
* **Brain** — Every agent must have a Brain. The brain determines how an agent
makes decisions. All the agents in the 3D Balance Ball scene share the same
brain.
observe its environment. 3D Balance Ball does not use camera observations.
* **Max Step** — Defines how many simulation steps can occur before the agent
decides it is done. In 3D Balance Ball, an agent restarts after 5000 steps.
observe its environment. 3D Balance Ball does not use camera observations.
* **Max Step** — Defines how many simulation steps can occur before the agent
decides it is done. In 3D Balance Ball, an agent restarts after 5000 steps.
3D Balance Ball sets this true so that the agent restarts after reaching the
**Max Step** count or after dropping the ball.
3D Balance Ball sets this true so that the agent restarts after reaching the
**Max Step** count or after dropping the ball.
Perhaps the more interesting aspect of an agent is the Agent subclass
implementation. When you create an agent, you must extend the base Agent class.
Perhaps the more interesting aspect of an agent is the Agent subclass
implementation. When you create an agent, you must extend the base Agent class.
* Agent.AgentReset() — Called when the Agent resets, including at the beginning
of a session. The Ball3DAgent class uses the reset function to reset the
platform and ball. The function randomizes the reset values so that the
training generalizes to more than a specific starting position and platform
attitude.
* Agent.CollectObservations() — Called every simulation step. Responsible for
collecting the agent's observations of the environment. Since the Brain
instance assigned to the agent is set to the continuous vector observation
space with a state size of 8, the `CollectObservations()` must call
`AddVectorObs` 8 times.
* Agent.AgentReset() — Called when the Agent resets, including at the beginning
of a session. The Ball3DAgent class uses the reset function to reset the
platform and ball. The function randomizes the reset values so that the
training generalizes to more than a specific starting position and platform
attitude.
* Agent.CollectObservations() — Called every simulation step. Responsible for
collecting the agent's observations of the environment. Since the Brain
instance assigned to the agent is set to the continuous vector observation
space with a state size of 8, the `CollectObservations()` must call
`AddVectorObs` 8 times.
by the brain. The Ball3DAgent example handles both the continuous and the
discrete action space types. There isn't actually much difference between the
two state types in this environment — both vector action spaces result in a
small change in platform rotation at each step. The `AgentAction()` function
assigns a reward to the agent; in this example, an agent receives a small
positive reward for each step it keeps the ball on the platform and a larger,
negative reward for dropping the ball. An agent is also marked as done when it
drops the ball so that it will reset with a new ball for the next simulation
step.
by the brain. The Ball3DAgent example handles both the continuous and the
discrete action space types. There isn't actually much difference between the
two state types in this environment — both vector action spaces result in a
small change in platform rotation at each step. The `AgentAction()` function
assigns a reward to the agent; in this example, an agent receives a small
positive reward for each step it keeps the ball on the platform and a larger,
negative reward for dropping the ball. An agent is also marked as done when it
drops the ball so that it will reset with a new ball for the next simulation
step.
## Training the Brain with Reinforcement Learning
In order to train an agent to correctly balance the ball, we will use a
Reinforcement Learning algorithm called Proximal Policy Optimization (PPO).
This is a method that has been shown to be safe, efficient, and more general
purpose than many other RL algorithms, as such we have chosen it as the
example algorithm for use with ML-Agents toolkit. For more information on PPO,
OpenAI has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/)
In order to train an agent to correctly balance the ball, we will use a
Reinforcement Learning algorithm called Proximal Policy Optimization (PPO). This
is a method that has been shown to be safe, efficient, and more general purpose
than many other RL algorithms, as such we have chosen it as the example
algorithm for use with ML-Agents toolkit. For more information on PPO, OpenAI
has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/)
To train the agents within the Ball Balance environment, we will be using the
python package. We have provided a convenient Python wrapper script called
`learn.py` which accepts arguments used to configure both training and inference
phases.
To train the agents within the Ball Balance environment, we will be using the python
package. We have provided a convenient Python wrapper script called `learn.py` which accepts arguments used to configure both training and inference phases.
We can use `run_id` to identify the experiment and create a folder where the model and summary statistics are stored. When using TensorBoard to observe the training statistics, it helps to set this to a sequential value
for each training run. In other words, "BalanceBall1" for the first run,
"BalanceBall2" or the second, and so on. If you don't, the summaries for
every training run are saved to the same directory and will all be included
on the same graph.
We can use `run_id` to identify the experiment and create a folder where the
model and summary statistics are stored. When using TensorBoard to observe the
training statistics, it helps to set this to a sequential value for each
training run. In other words, "BalanceBall1" for the first run, "BalanceBall2"
or the second, and so on. If you don't, the summaries for every training run are
saved to the same directory and will all be included on the same graph.
To summarize, go to your command line, enter the `ml-agents/python` directory and type:
To summarize, go to your command line, enter the `ml-agents` directory and type:
When the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen, you can press the :arrow_forward: button in Unity to start training in the Editor.
**Note**: If you're using Anaconda, don't forget to activate the ml-agents environment first.
When the message _"Start training by pressing the Play button in the Unity
Editor"_ is displayed on the screen, you can press the :arrow_forward: button in
Unity to start training in the Editor.
The `--train` flag tells the ML-Agents toolkit to run in training mode.
**Note**: If you're using Anaconda, don't forget to activate the ml-agents
environment first.
**Note**: You can train using an executable rather than the Editor. To do so, follow the intructions in
[Using an Execuatble](Learning-Environment-Executable.md).
The `--train` flag tells the ML-Agents toolkit to run in training mode.
**Note**: You can train using an executable rather than the Editor. To do so,
follow the intructions in [Using an
Execuatble](Learning-Environment-Executable.md).
Once you start training using `learn.py` in the way described in the previous section, the `ml-agents/python` folder will
contain a `summaries` directory. In order to observe the training process
in more detail, you can use TensorBoard. From the command line navigate to `ml-agents/python` folder and run:
Once you start training using `learn.py` in the way described in the previous
section, the `ml-agents` directory will contain a `summaries` directory. In
order to observe the training process in more detail, you can use TensorBoard.
This is not used in the 3D Balance Ball environment.
* Cumulative Reward - The mean cumulative episode reward over all agents.
Should increase during a successful training session.
* Entropy - How random the decisions of the model are. Should slowly decrease
during a successful training process. If it decreases too quickly, the `beta`
hyperparameter should be increased.
* Episode Length - The mean length of each episode in the environment for all
agents.
* Learning Rate - How large a step the training algorithm takes as it searches
for the optimal policy. Should decrease over time.
* Lesson - only interesting when performing [curriculum
training](Training-Curriculum-Learning.md). This is not used in the 3D Balance
Ball environment.
* Cumulative Reward - The mean cumulative episode reward over all agents. Should
increase during a successful training session.
* Entropy - How random the decisions of the model are. Should slowly decrease
during a successful training process. If it decreases too quickly, the `beta`
hyperparameter should be increased.
* Episode Length - The mean length of each episode in the environment for all
agents.
* Learning Rate - How large a step the training algorithm takes as it searches
for the optimal policy. Should decrease over time.
much the policy (process for deciding actions) is changing. The magnitude of
this should decrease during a successful training session.
* Value Estimate - The mean value estimate for all states visited by the agent.
Should increase during a successful training session.
much the policy (process for deciding actions) is changing. The magnitude of
this should decrease during a successful training session.
* Value Estimate - The mean value estimate for all states visited by the agent.
Should increase during a successful training session.
well the model is able to predict the value of each state. This should decrease
during a successful training session.
well the model is able to predict the value of each state. This should
decrease during a successful training session.
Once the training process completes, and the training process saves the model
(denoted by the `Saved Model` message) you can add it to the Unity project and
use it with agents having an **Internal** brain type.
**Note:** Do not just close the Unity Window once the `Saved Model` message appears. Either wait for the training process to close the window or press Ctrl+C at the command-line prompt. If you simply close the window manually, the .bytes file containing the trained model is not exported into the ml-agents folder.
Once the training process completes, and the training process saves the model
(denoted by the `Saved Model` message) you can add it to the Unity project and
use it with agents having an **Internal** brain type. **Note:** Do not just
close the Unity Window once the `Saved Model` message appears. Either wait for
the training process to close the window or press Ctrl+C at the command-line
prompt. If you simply close the window manually, the .bytes file containing the
trained model is not exported into the ml-agents folder.
Because TensorFlowSharp support is still experimental, it is disabled by
default. In order to enable it, you must follow these steps. Please note that
Because TensorFlowSharp support is still experimental, it is disabled by
default. In order to enable it, you must follow these steps. Please note that
To set up the TensorFlowSharp Support, follow [Setting up ML-Agents Toolkit within Unity](Basic-Guide.md#setting-up-ml-agents-within-unity) section.
of the Basic Guide page.
To set up the TensorFlowSharp Support, follow [Setting up ML-Agents Toolkit
within Unity](Basic-Guide.md#setting-up-ml-agents-within-unity) section. of the
Basic Guide page.
To embed the trained model into Unity, follow the later part of [Training the Brain with Reinforcement Learning](Basic-Guide.md#training-the-brain-with-reinforcement-learning) section of the Basic Buides page.
To embed the trained model into Unity, follow the later part of [Training the
_Linux Build Support_ component when installing Unity.
<palign="center">
<imgsrc="images/unity_linux_build_support.png"
alt="Linux Build Support"
<imgsrc="images/unity_linux_build_support.png"
alt="Linux Build Support"
Once installed, you will want to clone the ML-Agents Toolkit GitHub repository.
Once installed, you will want to clone the ML-Agents Toolkit GitHub repository.
The `unity-environment` directory in this repository contains the Unity Assets
The `MLAgentsSDK` directory in this repository contains the Unity Assets
Both directories are located at the root of the repository.
Both directories are located at the root of the repository.
the dependencies listed in the [requirements file](../python/requirements.txt).
the dependencies listed in the [requirements file](../requirements.txt).
- [TensorFlow](Background-TensorFlow.md)
- [Jupyter](Background-Jupyter.md)
- [TensorFlow](Background-TensorFlow.md)
- [Jupyter](Background-Jupyter.md)
### NOTES
**NOTES**
- If you are using Anaconda and are having trouble with TensorFlow, please see the following [note](https://www.tensorflow.org/install/install_mac#installing_with_anaconda) on how to install TensorFlow in an Anaconda environment.
- If you are using Anaconda and are having trouble with TensorFlow, please see
on how to install TensorFlow in an Anaconda environment.
If you are a Windows user who is new to Python and TensorFlow, follow [this guide](Installation-Windows.md) to set up your Python environment.
If you are a Windows user who is new to Python and TensorFlow, follow [this
guide](Installation-Windows.md) to set up your Python environment.
[Download](https://www.python.org/downloads/) and install Python 3 if you do not already have it.
[Download](https://www.python.org/downloads/) and install Python 3 if you do not
already have it.
If your Python environment doesn't include `pip`, see these
If your Python environment doesn't include `pip`, see these
To install dependencies, **go into the `python` subdirectory** of the repository,
and run from the command line:
To install dependencies, **go into the `python` subdirectory** of the
repository, and run from the command line:
If you'd like to use Docker for ML-Agents, please follow
[this guide](Using-Docker.md).
If you'd like to use Docker for ML-Agents, please follow
[this guide](Using-Docker.md).
The [Basic Guide](Basic-Guide.md) page contains several short
tutorials on setting up the ML-Agents toolkit within Unity, running a pre-trained model, in
The [Basic Guide](Basic-Guide.md) page contains several short tutorials on
setting up the ML-Agents toolkit within Unity, running a pre-trained model, in
If you run into any problems regarding ML-Agents, refer to our [FAQ](FAQ.md) and our [Limitations](Limitations.md) pages. If you can't find anything please
If you run into any problems regarding ML-Agents, refer to our [FAQ](FAQ.md) and
our [Limitations](Limitations.md) pages. If you can't find anything please
make sure to cite relevant information on OS, Python version, and exact error
message (whenever possible).
make sure to cite relevant information on OS, Python version, and exact error
2. In a file system window, navigate to the folder containing your cloned ML-Agents repository.
3. Drag the `ML-Agents` folder from `unity-environments/Assets` to the Unity Editor Project window.
3. Drag the `ML-Agents` folder from `MLAgentsSDK/Assets` to the Unity Editor Project window.
Your Unity **Project** window should contain the following assets:
Press **Play** to run the scene and use the WASD keys to move the agent around the platform. Make sure that there are no errors displayed in the Unity editor Console window and that the agent resets when it reaches its target or falls from the platform. Note that for more involved debugging, the ML-Agents SDK includes a convenient Monitor class that you can use to easily display agent status information in the Game window.
One additional test you can perform is to first ensure that your environment
and the Python API work as expected using the `python/Basics`
[Jupyter notebook](Background-Jupyter.md). Within `Basics`, be sure to set
`env_name` to the name of the environment file you specify when building
this environment.
One additional test you can perform is to first ensure that your environment and
the Python API work as expected using the `notebooks/getting-started.ipynb`
[Jupyter notebook](Background-Jupyter.md). Within the notebook, be sure to set
`env_name` to the name of the environment file you specify when building this
environment.
Now you can train the Agent. To get ready for training, you must first to change the **Brain Type** from **Player** to **External**. From there, the process is the same as described in [Training ML-Agents](Training-ML-Agents.md).
2. On the Projects dialog, choose the **Open** option at the top of the window.
3. Using the file dialog that opens, locate the `unity-environment` folder
3. Using the file dialog that opens, locate the `MLAgentsSDK` folder
within the ML-Agents project and click **Open**.
4. In the **Project** window, navigate to the folder
`Assets/ML-Agents/Examples/3DBall/`.
![Training running](images/training-running.png)
You can press Ctrl+C to stop the training, and your trained model will be at `ml-agents/python/models/<run-identifier>/<env_name>_<run-identifier>.bytes`, which corresponds to your model's latest checkpoint. You can now embed this trained model into your internal brain by following the steps below:
You can press Ctrl+C to stop the training, and your trained model will be at `models/<run-identifier>/<env_name>_<run-identifier>.bytes`, which corresponds to your model's latest checkpoint. You can now embed this trained model into your internal brain by following the steps below:
The ML-Agents toolkit provides a Python API for controlling the agent simulation loop of a environment or game built with Unity. This API is used by the ML-Agent training algorithms (run with `learn.py`), but you can also write your Python programs using this API.
The ML-Agents toolkit provides a Python API for controlling the agent simulation
loop of a environment or game built with Unity. This API is used by the ML-Agent
training algorithms (run with `learn.py`), but you can also write your Python
programs using this API.
* **UnityEnvironment** — the main interface between the Unity application and your code. Use UnityEnvironment to start and control a simulation or training session.
* **BrainInfo** — contains all the data from agents in the simulation, such as observations and rewards.
* **BrainParameters** — describes the data elements in a BrainInfo object. For example, provides the array length of an observation in BrainInfo.
- **UnityEnvironment** — the main interface between the Unity application and
your code. Use UnityEnvironment to start and control a simulation or training
session.
- **BrainInfo** — contains all the data from agents in the simulation, such as
observations and rewards.
- **BrainParameters** — describes the data elements in a BrainInfo object. For
example, provides the array length of an observation in BrainInfo.
These classes are all defined in the `python/unityagents` folder of the ML-Agents SDK.
These classes are all defined in the `mlagents/envs` folder of the ML-Agents SDK.
To communicate with an agent in a Unity environment from a Python program, the agent must either use an **External** brain or use a brain that is broadcasting (has its **Broadcast** property set to true). Your code is expected to return actions for agents with external brains, but can only observe broadcasting brains (the information you receive for an agent is the same in both cases). See [Using the Broadcast Feature](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).
To communicate with an agent in a Unity environment from a Python program, the
agent must either use an **External** brain or use a brain that is broadcasting
(has its **Broadcast** property set to true). Your code is expected to return
actions for agents with external brains, but can only observe broadcasting
brains (the information you receive for an agent is the same in both cases). See
For a simple example of using the Python API to interact with a Unity environment, see the Basic [Jupyter](Background-Jupyter.md) notebook (`python/notebooks/getting-started.ipynb`), which opens an environment, runs a few simulation steps taking random actions, and closes the environment.
For a simple example of using the Python API to interact with a Unity
environment, see the Basic [Jupyter](Background-Jupyter.md) notebook
(`notebooks/getting-started.ipynb`), which opens an environment, runs a few
simulation steps taking random actions, and closes the environment.
_Notice: Currently communication between Unity and Python takes place over an open socket without authentication. As such, please make sure that the network where training takes place is secure. This will be addressed in a future release._
_Notice: Currently communication between Unity and Python takes place over an
open socket without authentication. As such, please make sure that the network
where training takes place is secure. This will be addressed in a future
release._
Python-side communication happens through `UnityEnvironment` which is located in `python/unityagents`. To load a Unity environment from a built binary file, put the file in the same directory as `unityagents`. For example, if the filename of your Unity environment is 3DBall.app, in python, run:
Python-side communication happens through `UnityEnvironment` which is located in
`mlagents/envs`. To load a Unity environment from a built binary file, put the
file in the same directory as `envs`. For example, if the filename of your Unity
environment is 3DBall.app, in python, run:
```python
from unityagents import UnityEnvironment
* `file_name` is the name of the environment binary (located in the root directory of the python project).
* `worker_id` indicates which port to use for communication with the environment. For use in parallel training regimes such as A3C.
* `seed` indicates the seed to use when generating random numbers during the training process. In environments which do not involve physics calculations, setting the seed enables reproducible experimentation by ensuring that the environment and trainers utilize the same random seed.
- `file_name` is the name of the environment binary (located in the root
directory of the python project).
- `worker_id` indicates which port to use for communication with the
environment. For use in parallel training regimes such as A3C.
- `seed` indicates the seed to use when generating random numbers during the
training process. In environments which do not involve physics calculations,
setting the seed enables reproducible experimentation by ensuring that the
environment and trainers utilize the same random seed.
If you want to directly interact with the Editor, you need to use `file_name=None`, then press the :arrow_forward: button in the Editor when the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen
If you want to directly interact with the Editor, you need to use
`file_name=None`, then press the :arrow_forward: button in the Editor when the
message _"Start training by pressing the Play button in the Unity Editor"_ is
displayed on the screen
* **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of the list corresponds to the n<sup>th</sup> observation of the brain.
* **`vector_observations`** : A two dimensional numpy array of dimension `(batch size, vector observation size)`.
* **`text_observations`** : A list of string corresponding to the agents text observations.
* **`memories`** : A two dimensional numpy array of dimension `(batch size, memory size)` which corresponds to the memories sent at the previous step.
* **`rewards`** : A list as long as the number of agents using the brain containing the rewards they each obtained at the previous step.
* **`local_done`** : A list as long as the number of agents using the brain containing `done` flags (whether or not the agent is done).
* **`max_reached`** : A list as long as the number of agents using the brain containing true if the agents reached their max steps.
* **`agents`** : A list of the unique ids of the agents using the brain.
* **`previous_actions`** : A two dimensional numpy array of dimension `(batch size, vector action size)` if the vector action space is continuous and `(batch size, number of branches)` if the vector action space is discrete.
- **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of
the list corresponds to the n<sup>th</sup> observation of the brain.
- **`vector_observations`** : A two dimensional numpy array of dimension `(batch
size, vector observation size)`.
- **`text_observations`** : A list of string corresponding to the agents text
observations.
- **`memories`** : A two dimensional numpy array of dimension `(batch size,
memory size)` which corresponds to the memories sent at the previous step.
- **`rewards`** : A list as long as the number of agents using the brain
containing the rewards they each obtained at the previous step.
- **`local_done`** : A list as long as the number of agents using the brain
containing `done` flags (whether or not the agent is done).
- **`max_reached`** : A list as long as the number of agents using the brain
containing true if the agents reached their max steps.
- **`agents`** : A list of the unique ids of the agents using the brain.
- **`previous_actions`** : A two dimensional numpy array of dimension `(batch
size, vector action size)` if the vector action space is continuous and
`(batch size, number of branches)` if the vector action space is discrete.
Once loaded, you can use your UnityEnvironment object, which referenced by a variable named `env` in this example, can be used in the following way:
Once loaded, you can use your UnityEnvironment object, which referenced by a
variable named `env` in this example, can be used in the following way:
Prints all parameters relevant to the loaded environment and the external brains.
Prints all parameters relevant to the loaded environment and the external
brains.
Send a reset signal to the environment, and provides a dictionary mapping brain names to BrainInfo objects.
- `train_model` indicates whether to run the environment in train (`True`) or test (`False`) mode.
- `config` is an optional dictionary of configuration flags specific to the environment. For generic environments, `config` can be ignored. `config` is a dictionary of strings to floats where the keys are the names of the `resetParameters` and the values are their corresponding float values. Define the reset parameters on the [Academy Inspector](Learning-Environment-Design-Academy.md#academy-properties) window in the Unity Editor.
Send a reset signal to the environment, and provides a dictionary mapping
brain names to BrainInfo objects.
- `train_model` indicates whether to run the environment in train (`True`) or
test (`False`) mode.
- `config` is an optional dictionary of configuration flags specific to the
environment. For generic environments, `config` can be ignored. `config` is
a dictionary of strings to floats where the keys are the names of the
`resetParameters` and the values are their corresponding float values.
Sends a step signal to the environment using the actions. For each brain :
- `action` can be one dimensional arrays or two dimensional arrays if you have multiple agents per brains.
- `memory` is an optional input that can be used to send a list of floats per agents to be retrieved at the next step.
- `text_action` is an optional input that be used to send a single string per agent.
Sends a step signal to the environment using the actions. For each brain :
- `action` can be one dimensional arrays or two dimensional arrays if you have
multiple agents per brains.
- `memory` is an optional input that can be used to send a list of floats per
agents to be retrieved at the next step.
- `text_action` is an optional input that be used to send a single string per
agent.
For example, to access the BrainInfo belonging to a brain called 'brain_name', and the BrainInfo field 'vector_observations':
For example, to access the BrainInfo belonging to a brain called
'brain_name', and the BrainInfo field 'vector_observations':
```
```
Note that if you have more than one external brain in the environment, you must provide dictionaries from brain names to arrays for `action`, `memory` and