|
|
|
|
|
|
|
|
|
|
- [Decisions](#decisions) |
|
|
|
- [Observations and Sensors](#observations-and-sensors) |
|
|
|
- [Generating Observations](#generating-observations) |
|
|
|
- [Agent.CollectObservations()](#agentcollectobservations) |
|
|
|
- [Observable Fields and Properties](#observable-fields-and-properties) |
|
|
|
- [ISensor interface and SensorComponents](#isensor-interface-and-sensorcomponents) |
|
|
|
- [Stacking](#stacking) |
|
|
|
- [Vector Observation Summary & Best Practices](#vector-observation-summary--best-practices) |
|
|
|
- [Visual Observations](#visual-observations) |
|
|
|
- [Visual Observation Summary & Best Practices](#visual-observation-summary--best-practices) |
|
|
|
|
|
|
write your own Policy. If the Agent has a `Model` file, its Policy will use the |
|
|
|
neural network `Model` to take decisions. |
|
|
|
|
|
|
|
When you create an Agent, you must extend the base Agent class. This includes |
|
|
|
implementing the following methods: |
|
|
|
When you create an Agent, you should usually extend the base Agent class. This |
|
|
|
includes implementing the following methods: |
|
|
|
including at the beginning of the simulation. The Ball3DAgent class uses this |
|
|
|
function to reset the agent cube and ball to their starting positions. The |
|
|
|
function randomizes the reset values so that the training generalizes to more |
|
|
|
than a specific starting position and agent cube attitude. |
|
|
|
- `Agent.CollectObservations(VectorSensor sensor)` — Called every simulation |
|
|
|
step. Responsible for collecting the Agent's observations of the environment. |
|
|
|
Since the Behavior Parameters of the Agent are set with vector observation |
|
|
|
space with a state size of 8, the `CollectObservations(VectorSensor sensor)` |
|
|
|
must call `VectorSensor.AddObservation()` such that vector size adds up to 8. |
|
|
|
including at the beginning of the simulation. |
|
|
|
- `Agent.CollectObservations(VectorSensor sensor)` — Called every step that the Agent |
|
|
|
requests a decision. This is one possible way for collecting the Agent's |
|
|
|
observations of the environment; see [Generating Observations](#generating-observations) |
|
|
|
below for more options. |
|
|
|
take. Receives the action chosen by the Agent. The vector action spaces result |
|
|
|
in a small change in the agent cube's rotation at each step. The |
|
|
|
`OnActionReceived()` method assigns a reward to the Agent; in this example, an |
|
|
|
Agent receives a small positive reward for each step it keeps the ball on the |
|
|
|
agent cube's head and a larger, negative reward for dropping the ball. An |
|
|
|
Agent's episode is also ended when it drops the ball so that it will reset |
|
|
|
with a new ball for the next simulation step. |
|
|
|
take. Receives the action chosen by the Agent. It is also common to assign a |
|
|
|
reward in this method. |
|
|
|
returns an array of floats. In the case of the Ball 3D Agent, the |
|
|
|
`Heuristic()` method converts the keyboard inputs into actions. |
|
|
|
writes to a provided array of floats. |
|
|
|
|
|
|
|
As a concrete example, here is how the Ball3DAgent class implements these methods: |
|
|
|
|
|
|
|
- `Agent.OnEpisodeBegin()` — Resets the agent cube and ball to their starting |
|
|
|
positions. The function randomizes the reset values so that the training |
|
|
|
generalizes to more than a specific starting position and agent cube |
|
|
|
orientation. |
|
|
|
- `Agent.CollectObservations(VectorSensor sensor)` — Adds information about the |
|
|
|
orientation of the agent cube, the ball velocity, and the relative position |
|
|
|
between the ball and the cube. Since the `CollectObservations()` |
|
|
|
method calls `VectorSensor.AddObservation()` such that vector size adds up to 8, |
|
|
|
the Behavior Parameters of the Agent are set with vector observation space |
|
|
|
with a state size of 8. |
|
|
|
- `Agent.OnActionReceived()` — The vector action spaces result |
|
|
|
in a small change in the agent cube's rotation at each step. In this example, |
|
|
|
an Agent receives a small positive reward for each step it keeps the ball on the |
|
|
|
agent cube's head and a larger, negative reward for dropping the ball. An |
|
|
|
Agent's episode is also ended when it drops the ball so that it will reset |
|
|
|
with a new ball for the next simulation step. |
|
|
|
- `Agent.Heuristic()` - Converts the keyboard inputs into actions. |
|
|
|
|
|
|
|
## Decisions |
|
|
|
|
|
|
|
|
|
|
should call `Agent.RequestDecision()` manually. |
|
|
|
|
|
|
|
## Observations and Sensors |
|
|
|
|
|
|
|
To make informed decisions, an agent must first make observations of the state |
|
|
|
of the environment. The observations are collected by Sensors attached to the |
|
|
|
agent GameObject. By default, agents come with a `VectorSensor` which allows |
|
|
|
them to collect floating-point observations into a single array. There are |
|
|
|
additional sensor components which can be attached to the agent GameObject which |
|
|
|
collect their own observations, or modify other observations. These are: |
|
|
|
|
|
|
|
- `CameraSensorComponent` - Allows image from `Camera` to be used as |
|
|
|
observation. |
|
|
|
- `RenderTextureSensorComponent` - Allows content of `RenderTexture` to be used |
|
|
|
as observation. |
|
|
|
- `RayPerceptionSensorComponent` - Allows information from set of ray-casts to |
|
|
|
be used as observation. |
|
|
|
In order for an agent to learn, the observations should include all the |
|
|
|
information an agent needs to accomplish its task. Without sufficient and |
|
|
|
relevant information, an agent may learn poorly or may not learn at all. A |
|
|
|
reasonable approach for determining what information should be included is to |
|
|
|
consider what you would need to calculate an analytical solution to the problem, |
|
|
|
or what you would expect a human to be able to use to solve the problem. |
|
|
|
### Vector Observations |
|
|
|
### Generating Observations |
|
|
|
ML-Agents provides multiple ways for an Agent to make observations: |
|
|
|
1. Overriding the `Agent.CollectObservations()` method and passing the |
|
|
|
observations to the provided `VectorSensor`. |
|
|
|
1. Adding the `[Observable]` attribute to fields and properties on the Agent. |
|
|
|
1. Implementing the `ISensor` interface, using a `SensorComponent` attached to |
|
|
|
the Agent to create the `ISensor`. |
|
|
|
Vector observations are best used for aspects of the environment which are |
|
|
|
#### Agent.CollectObservations() |
|
|
|
Agent.CollectObservations() is best used for aspects of the environment which are |
|
|
|
In order for an agent to learn, the observations should include all the |
|
|
|
information an agent needs to accomplish its task. Without sufficient and |
|
|
|
relevant information, an agent may learn poorly or may not learn at all. A |
|
|
|
reasonable approach for determining what information should be included is to |
|
|
|
consider what you would need to calculate an analytical solution to the problem, |
|
|
|
or what you would expect a human to be able to use to solve the problem. |
|
|
|
The `VectorSensor.AddObservation` method provides a number of overloads for |
|
|
|
adding common types of data to your observation vector. You can add Integers and |
|
|
|
booleans directly to the observation vector, as well as some common Unity data |
|
|
|
types such as `Vector2`, `Vector3`, and `Quaternion`. |
|
|
|
state observation. As an experiment, you can remove the velocity components from |
|
|
|
the observation and retrain the 3DBall agent. While it will learn to balance the |
|
|
|
ball reasonably well, the performance of the agent without using velocity is |
|
|
|
noticeably worse. |
|
|
|
state observation. |
|
|
|
private List<float> state = new List<float>(); |
|
|
|
// Orientation of the cube (2 floats) |
|
|
|
sensor.AddObservation((ball.transform.position.x - gameObject.transform.position.x)); |
|
|
|
sensor.AddObservation((ball.transform.position.y - gameObject.transform.position.y)); |
|
|
|
sensor.AddObservation((ball.transform.position.z - gameObject.transform.position.z)); |
|
|
|
sensor.AddObservation(ball.transform.GetComponent<Rigidbody>().velocity.x); |
|
|
|
sensor.AddObservation(ball.transform.GetComponent<Rigidbody>().velocity.y); |
|
|
|
sensor.AddObservation(ball.transform.GetComponent<Rigidbody>().velocity.z); |
|
|
|
// Relative position of the ball to the cube (3 floats) |
|
|
|
sensor.AddObservation(ball.transform.position - gameObject.transform.position); |
|
|
|
// Velocity of the ball (3 floats) |
|
|
|
sensor.AddObservation(m_BallRb.velocity); |
|
|
|
// 8 floats total |
|
|
|
The feature vector must always contain the same number of elements and |
|
|
|
observations must always be in the same position within the list. If the number |
|
|
|
of observed entities in an environment can vary you can pad the feature vector |
|
|
|
with zeros for any missing entities in a specific observation or you can limit |
|
|
|
As an experiment, you can remove the velocity components from |
|
|
|
the observation and retrain the 3DBall agent. While it will learn to balance the |
|
|
|
ball reasonably well, the performance of the agent without using velocity is |
|
|
|
noticeably worse. |
|
|
|
|
|
|
|
The observations passed to `VectorSensor.AddObservation()` must always contain |
|
|
|
the same number of elements must always be in the same order. If the number |
|
|
|
of observed entities in an environment can vary, you can pad the calls |
|
|
|
with zeros for any missing entities in a specific observation, or you can limit |
|
|
|
every enemy agent in an environment, you could only observe the closest five. |
|
|
|
every enemy in an environment, you could only observe the closest five. |
|
|
|
When you set up an Agent's `Behavior Parameters` in the Unity Editor, set the |
|
|
|
following properties to use a vector observation: |
|
|
|
Additionally, when you set up an Agent's `Behavior Parameters` in the Unity |
|
|
|
Editor, you must set the **Vector Observations > Space Size** |
|
|
|
to equal the number of floats that are written by `CollectObservations()`. |
|
|
|
- **Space Size** — The state size must match the length of your feature vector. |
|
|
|
#### Observable Fields and Properties |
|
|
|
Another approach is to define the relevant observations as fields or properties |
|
|
|
on your Agent class, and annotate them with an `ObservableAttribute`. For |
|
|
|
example, in the 3DBall example above, the rigid body velocity could be observed |
|
|
|
by adding a property to the Agent: |
|
|
|
```csharp |
|
|
|
using Unity.MLAgents.Sensors.Reflection; |
|
|
|
The observation feature vector is a list of floating point numbers, which means |
|
|
|
you must convert any other data types to a float or a list of floats. |
|
|
|
public class Ball3DAgent : Agent { |
|
|
|
The `VectorSensor.AddObservation` method provides a number of overloads for |
|
|
|
adding common types of data to your observation vector. You can add Integers and |
|
|
|
booleans directly to the observation vector, as well as some common Unity data |
|
|
|
types such as `Vector2`, `Vector3`, and `Quaternion`. |
|
|
|
[Observable] |
|
|
|
public Vector3 RigidBodyVelocity |
|
|
|
{ |
|
|
|
get { return m_BallRb.velocity; } |
|
|
|
} |
|
|
|
} |
|
|
|
``` |
|
|
|
`ObservableAttribute` currently supports most basic types (e.g. floats, ints, |
|
|
|
bools), as well as `Vector2`, `Vector3`, `Vector4`, `Quaternion`, and enums. |
|
|
|
|
|
|
|
The behavior of `ObservableAttribute`s are controlled by the "Observable Attribute |
|
|
|
Handling" in the Agent's `Behavior Parameters`. The possible values for this are: |
|
|
|
* **Ignore** (default) - All ObservableAttributes on the Agent will be ignored. |
|
|
|
If there are no ObservableAttributes on the Agent, this will result in the |
|
|
|
fastest initialization time. |
|
|
|
* **Exclude Inherited** - Only members on the declared class will be examined; |
|
|
|
members that are inherited are ignored. This is a reasonable tradeoff between |
|
|
|
performance and flexibility. |
|
|
|
* **Examine All** All members on the class will be examined. This can lead to |
|
|
|
slower startup times. |
|
|
|
|
|
|
|
"Exclude Inherited" is generally sufficient, but if your Agent inherits from |
|
|
|
another Agent implementation that has Observable members, you will need to use |
|
|
|
"Examine All". |
|
|
|
|
|
|
|
Internally, ObservableAttribute uses reflection to determine which members of |
|
|
|
the Agent have ObservableAttributes, and also uses reflection to access the |
|
|
|
fields or invoke the properties at runtime. This may be slower than using |
|
|
|
CollectObservations or an ISensor, although this might not be enough to |
|
|
|
noticeably affect performance. |
|
|
|
|
|
|
|
**NOTE**: you do not need to adjust the Space Size in the Agent's |
|
|
|
`Behavior Parameters` when you add `[Observable]` fields or properties to an |
|
|
|
Agent, since their size can be computed before they are used. |
|
|
|
|
|
|
|
#### ISensor interface and SensorComponents |
|
|
|
The `ISensor` interface is generally intended for advanced users. The `Write()` |
|
|
|
method is used to actually generate the observation, but some other methods |
|
|
|
such as returning the shape of the observations must also be implemented. |
|
|
|
|
|
|
|
The `SensorComponent` abstract class is used to create the actual `ISensor` at |
|
|
|
runtime. It must be attached to the same `GameObject` as the `Agent`, or to a |
|
|
|
child `GameObject`. |
|
|
|
|
|
|
|
There are several SensorComponents provided in the API: |
|
|
|
- `CameraSensorComponent` - Allows image from `Camera` to be used as |
|
|
|
observation. |
|
|
|
- `RenderTextureSensorComponent` - Allows content of `RenderTexture` to be used |
|
|
|
as observation. |
|
|
|
- `RayPerceptionSensorComponent` - Allows information from set of ray-casts to |
|
|
|
be used as observation. |
|
|
|
|
|
|
|
**NOTE**: you do not need to adjust the Space Size in the Agent's |
|
|
|
`Behavior Parameters` when using an ISensor SensorComponents. |
|
|
|
|
|
|
|
Internally, both `Agent.CollectObservations` and `[Observable]` attribute use an |
|
|
|
ISensors to write observations, although this is mostly abstracted from the user. |
|
|
|
|
|
|
|
### Vector Observations |
|
|
|
Both `Agent.CollectObservations()` and `ObservableAttribute`s produce vector |
|
|
|
observations, which are represented at lists of `float`s. `ISensor`s can |
|
|
|
produce both vector observations and visual observations, which are |
|
|
|
multi-dimensional arrays of floats. |
|
|
|
|
|
|
|
Below are some additional considerations when dealing with vector observations: |
|
|
|
|
|
|
|
#### One-hot encoding categorical information |
|
|
|
|
|
|
|
|
|
|
the feature vector. The following code example illustrates how to add. |
|
|
|
|
|
|
|
```csharp |
|
|
|
enum CarriedItems { Sword, Shield, Bow, LastItem } |
|
|
|
private List<float> state = new List<float>(); |
|
|
|
enum ItemType { Sword, Shield, Bow, LastItem } |
|
|
|
for (int ci = 0; ci < (int)CarriedItems.LastItem; ci++) |
|
|
|
for (int ci = 0; ci < (int)ItemType.LastItem; ci++) |
|
|
|
{ |
|
|
|
sensor.AddObservation((int)currentItem == ci ? 1.0f : 0.0f); |
|
|
|
} |
|
|
|
|
|
|
to the previous one. |
|
|
|
|
|
|
|
```csharp |
|
|
|
enum CarriedItems { Sword, Shield, Bow, LastItem } |
|
|
|
const int NUM_ITEM_TYPES = (int)CarriedItems.LastItem; |
|
|
|
enum ItemType { Sword, Shield, Bow, LastItem } |
|
|
|
const int NUM_ITEM_TYPES = (int)ItemType.LastItem; |
|
|
|
|
|
|
|
public override void CollectObservations(VectorSensor sensor) |
|
|
|
{ |
|
|
|
|
|
|
} |
|
|
|
``` |
|
|
|
|
|
|
|
`ObservableAttribute` has built-in support for enums. Note that you don't need |
|
|
|
the `LastItem` placeholder in this case: |
|
|
|
```csharp |
|
|
|
enum ItemType { Sword, Shield, Bow } |
|
|
|
|
|
|
|
public class HeroAgent : Agent |
|
|
|
{ |
|
|
|
[Observable] |
|
|
|
ItemType m_CurrentItem; |
|
|
|
} |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
angle, or, if the number of turns is significant, increase the maximum value |
|
|
|
used in your normalization formula. |
|
|
|
|
|
|
|
#### Stacking |
|
|
|
Stacking refers to repeating observations from previous steps as part of a |
|
|
|
larger observation. For example, consider an Agent that generates these |
|
|
|
observations in four steps |
|
|
|
``` |
|
|
|
step 1: [0.1] |
|
|
|
step 2: [0.2] |
|
|
|
step 3: [0.3] |
|
|
|
step 4: [0.4] |
|
|
|
``` |
|
|
|
|
|
|
|
If we use a stack size of 3, the observations would instead be: |
|
|
|
```csharp |
|
|
|
step 1: [0.1, 0.0, 0.0] |
|
|
|
step 2: [0.2, 0.1, 0.0] |
|
|
|
step 3: [0.3, 0.2, 0.1] |
|
|
|
step 4: [0.4, 0.3, 0.2] |
|
|
|
``` |
|
|
|
(The observations are padded with zeroes for the first `stackSize-1` steps). |
|
|
|
This is a simple way to give an Agent limited "memory" without the complexity |
|
|
|
of adding a recurrent neural network (RNN). |
|
|
|
|
|
|
|
The steps for enabling stacking depends on how you generate observations: |
|
|
|
* For Agent.CollectObservations(), set "Stacked Vectors" on the Agent's |
|
|
|
`Behavior Parameters` to a value greater than 1. |
|
|
|
* For ObservableAttribute, set the `numStackedObservations` parameter in the |
|
|
|
constructor, e.g. `[Observable(numStackedObservations: 2)]`. |
|
|
|
* For `ISensor`s, wrap them in a `StackingSensor` (which is also an `ISensor`). |
|
|
|
Generally, this should happen in the `CreateSensor()` method of your |
|
|
|
`SensorComponent`. |
|
|
|
|
|
|
|
Note that stacking currently only supports for vector observations; stacking |
|
|
|
for visual observations is not supported. |
|
|
|
|
|
|
|
#### Vector Observation Summary & Best Practices |
|
|
|
|
|
|
|
- Vector Observations should include all variables relevant for allowing the |
|
|
|
|
|
|
value in the agent GameObject's `Behavior Parameters` should be changed. |
|
|
|
- Categorical variables such as type of object (Sword, Shield, Bow) should be |
|
|
|
encoded in one-hot fashion (i.e. `3` -> `0, 0, 1`). This can be done |
|
|
|
automatically using the `AddOneHotObservation()` method of the `VectorSensor`. |
|
|
|
automatically using the `AddOneHotObservation()` method of the `VectorSensor`, |
|
|
|
or using `[Observable]` on an enum field or property of the Agent. |
|
|
|
- In general, all inputs should be normalized to be in the range 0 to +1 (or -1 |
|
|
|
to 1). For example, the `x` position information of an agent where the maximum |
|
|
|
possible value is `maxValue` should be recorded as |
|
|
|