# Agents
An agent is an actor that can observe its environment and decide on the best
course of action using those observations. Create Agents in Unity by extending
the Agent class. The most important aspects of creating agents that can
successfully learn are the observations the agent collects for
reinforcement learning and the reward you assign to estimate the value of the
An agent is an entity that can observe its environment, decide on the best
course of action using those observations, and execute those actions within
its environment. Agents can be created in Unity by extending
the `Agent` class. The most important aspects of creating agents that can
successfully learn are the observations the agent collects,
and the reward you assign to estimate the value of the
An Agent passes its observations to its Policy. The Policy, then, makes a decision
An Agent passes its observations to its Policy. The Policy then makes a decision
and passes the chosen action back to the agent. Your agent code must execute the
action, for example, move the agent in one direction or another. In order to
[train an agent using reinforcement learning ](Learning-Environment-Design.md ),
The Policy class abstracts out the decision making logic from the Agent itself so
The `Policy` class abstracts out the decision making logic from the Agent itself so
decisions depends on the kind of Policy it is. You can change the Policy of an
Agent by changing its `Behavior Parameters` . If you set `Behavior Type` to
`Heuristic Only` , the Agent will use its `Heuristic()` method to make decisions
which can allow you to control the Agent manually or write your own Policy. If
the Agent has a `Model` file, it Policy will use the neural network `Model` to
take decisions.
decisions depends on the `Behavior Parameters` associated with the agent. If you
set `Behavior Type` to `Heuristic Only` , the Agent will use its `Heuristic()`
method to make decisions which can allow you to control the Agent manually or
write your own Policy. If the Agent has a `Model` file, its Policy will use
the neural network `Model` to take decisions.
## Decisions
the Agent to request decisions on its own at regular intervals, add a
`Decision Requester` component to the Agent's Game Object. Making decisions at regular step
`Decision Requester` component to the Agent's GameObject. Making decisions at regular step
occur, should call `Agent.RequestDecision()` manually.
occur, such as in a turn-based game, should call `Agent.RequestDecision()` manually.
## Observations
## Observations and Sensors
To make decisions, an agent must observe its environment in order to infer the
state of the world. A state observation can take the following forms:
To make informed decisions, an agent must first make observations of the state of
the environment. The observations are collected by Sensors attached to the agent
GameObject. By default, agents come with a `VectorSensor` which allows them to
collect floating-point observations into a single array. There are additional
sensor components which can be attached to the agent GameObject which collect their own
observations, or modify other observations. These are:
* **Vector Observation** — a feature vector consisting of an array of floating
point numbers.
* **Visual Observations** — one or more camera images and/or render textures.
* `CameraSensorComponent` - Allows image from `Camera` to be used as observation.
* `RenderTextureSensorComponent` - Allows content of `RenderTexture` to be used as observation.
* `RayPerceptionSensorComponent` - Allows information from set of ray-casts to be used as observation.
When you use vector observations for an Agent, implement the
`Agent.CollectObservations(VectorSensor sensor)` method to create the feature vector. When you use
**Visual Observations**, you only need to identify which Unity Camera objects
or RenderTextures will provide images and the base Agent class handles the rest.
You do not need to implement the `CollectObservations(VectorSensor sensor)` method when your Agent
uses visual observations (unless it also uses vector observations).
### Vector Observations
### Vector Observation Space: Feature Vectors
Vector observations are best used for aspects of the environment which are numerical
and non-visual. The Policy class calls the `CollectObservations(VectorSensor sensor)`
method of each Agent. Your implementation of this function must call
`VectorSensor.AddObservation` to add vector observations.
For agents using a continuous state space, you create a feature vector to
represent the agent's observation at each step of the simulation. The Policy
class calls the `CollectObservations(VectorSensor sensor)` method of each Agent. Your
implementation of this function must call `VectorSensor.AddObservation` to add vector
observations.
The observation must include all the information an agents needs to accomplish
its task. Without sufficient and relevant information, an agent may learn poorly
In order for an agent to learn, the observations should include all the
information an agent needs to accomplish its task. Without sufficient and relevant
information, an agent may learn poorly
solution to the problem.
solution to the problem, or what you would expect a human to be able to use to solve the problem.
For examples of various state observation functions, you can look at the
[example environments ](Learning-Environment-Examples.md ) included in the
every enemy agent in an environment, you could only observe the closest five.
When you set up an Agent's `Behavior Parameters` in the Unity Editor, set the following
properties to use a continuous vector observation:
properties to use a vector observation:
* **Space Size** — The state size must match the length of your feature vector.
of data to your observation vector. You can add Integers and booleans directly to
the observation vector, as well as some common Unity data types such as `Vector2` ,
`Vector3` , and `Quaternion` .
#### One-hot encoding categorical information
Type enumerations should be encoded in the _one-hot_ style. That is, add an
element to the feature vector for each element of enumeration, setting the
}
```
`VectorSensor.AddObservation` also provides a two-argument version as a shortcut for _one-hot_
`VectorSensor` also provides a two-argument function `AddOneHotObservation()` as a shortcut for _one-hot_
style observations. The following example is identical to the previous one.
```csharp
angle, or, if the number of turns is significant, increase the maximum value
used in your normalization formula.
### Multiple Visual Observations
#### Vector Observation Summary & Best Practices
* Vector Observations should include all variables relevant for allowing the
agent to take the optimally informed decision, and ideally no extraneous information.
* In cases where Vector Observations need to be remembered or compared over
time, either an LSTM (see [here ](Feature-Memory.md )) should be used in the model, or the
`Stacked Vectors` value in the agent GameObject's `Behavior Parameters` should be changed.
* Categorical variables such as type of object (Sword, Shield, Bow) should be
encoded in one-hot fashion (i.e. `3` -> `0, 0, 1` ). This can be done automatically using the
`AddOneHotObservation()` method of the `VectorSensor` .
* In general, all inputs should be normalized to be in
the range 0 to +1 (or -1 to 1). For example, the `x` position information of
an agent where the maximum possible value is `maxValue` should be recorded as
`VectorSensor.AddObservation(transform.position.x / maxValue);` rather than
`VectorSensor.AddObservation(transform.position.x);` .
* Positional information of relevant GameObjects should be encoded in relative
coordinates wherever possible. This is often relative to the agent position.
### Visual Observations
Visual observations use rendered textures directly or from one or more
cameras in a scene. The Policy vectorizes the textures into a 3D Tensor which
can be fed into a convolutional neural network (CNN). For more information on
CNNs, see [this guide ](http://cs231n.github.io/convolutional-networks/ ). You
can use visual observations along side vector observations.
Visual observations are generally provided to agent via either a `CameraSensor` or `RenderTextureSensor` .
These collect image information and transforms it into a 3D Tensor which
can be fed into the convolutional neural network (CNN) of the agent policy. For more information on
CNNs, see [this guide ](http://cs231n.github.io/convolutional-networks/ ). This allows agents
to learn from spatial regularities in the observation images. It is possible to
use visual and vector observations with the same agent.
succeed at all.
succeed at all as compared to vector observations. As such, they should only be
used when it is not possible to properly define the problem using vector or ray-cast observations.
Visual observations can be derived from Cameras or RenderTextures within your scene.
To add a visual observation to an Agent, add either a Camera Sensor Component
![Agent RenderTexture Debug ](images/gridworld.png )
#### Visual Observation Summary & Best Practices
* To collect visual observations, attach `CameraSensor` or `RenderTextureSensor`
components to the agent GameObject.
* Visual observations should generally be used unless vector observations are not sufficient.
* Image size should be kept as small as possible, without the loss of
needed details for decision making.
* Images should be made greyscale in situations where color information is
not needed for making informed decisions.
Raycasts are an alternative system for the Agent to provide observations based on
the physical environment. This can be easily implemented by adding a
RayPerceptionSensorComponent3D (or RayPerceptionSensorComponent2D) to the Agent.
Raycasts are another possible method for providing observations to an agent.
This can be easily implemented by adding a
`RayPerceptionSensorComponent3D` (or `RayPerceptionSensorComponent2D` ) to the Agent GameObject.
During observations, several rays (or spheres, depending on settings) are cast into
the physics world, and the objects that are hit determine the observation vector that
* _Start Vertical Offset_ (3D only) The vertical offset of the ray start point.
* _End Vertical Offset_ (3D only) The vertical offset of the ray end point.
In the example image above, the Agent has two RayPerceptionSensorComponent3Ds.
In the example image above, the Agent has two ` RayPerceptionSensorComponent3D` s.
Both use 3 Rays Per Direction and 90 Max Ray Degrees. One of the components
had a vertical offset, so the Agent can tell whether it's clear to jump over
the wall.
`Behavior Parameters` , so you don't need to worry about the formula above when
setting the State Size.
## Vector Actions
#### RayCast Observation Summary & Best Practices
* Attach `RayPerceptionSensorComponent3D` or `RayPerceptionSensorComponent2D` to use.
* This observation type is best used when there is relevant spatial information
for the agent that doesn't require a fully rendered image to convey.
* Use as few rays and tags as necessary to solve the problem in order to improve learning stability and agent performance.
## Actions
agent's `OnActionReceived()` function. When you specify that the vector action space
agent's `OnActionReceived()` function. Actions for an agent can take one of two forms, either **Continuous** or **Discrete** .
When you specify that the vector action space
control signals with length equal to the `Vector Action Space Size` property.
floating point numbers with length equal to the `Vector Action Space Size` property.
When you specify a **Discrete** vector action space type, the action parameter
is an array containing integers. Each integer is an index into a list or table
of commands. In the **Discrete** vector action space type, the action parameter
array of integers, each value corresponds to the number of possibilities for
each branch.
For example, if we wanted an Agent that can move in an plane and jump, we could
For example, if we wanted an Agent that can move in a plane and jump, we could
define two branches (one for motion and one for jumping) because we want our
agent be able to move __and__ jump concurrently. We define the first branch to
have 5 possible actions (don't move, go left, go right, go backward, go forward)
neural network, the Agent will be unable to perform the specified action. Note
that when the Agent is controlled by its Heuristic, the Agent will
still be able to decide to perform the masked action. In order to mask an
action, override the `Agent.CollectDiscreteActionMasks()` virtual method, and call `DiscreteActionMasker.SetMask()` in it:
action, override the `Agent.CollectDiscreteActionMasks()` virtual method,
and call `DiscreteActionMasker.SetMask()` in it:
```csharp
public override void CollectDiscreteActionMasks(DiscreteActionMasker actionMasker){
* You cannot mask all the actions of a branch.
* You cannot mask actions in continuous control.
### Actions Summary & Best Practices
* Actions can either use `Discrete` or `Continuous` spaces.
* When using `Discrete` it is possible to assign multiple action branches, and to mask certain actions.
* In general, smaller action spaces will make for easier learning.
* Be sure to set the Vector Action's Space Size to the number of used Vector
Actions, and not greater, as doing the latter can interfere with the
efficiency of the training process.
* When using continuous control, action values should be clipped to an
appropriate range. The provided PPO model automatically clips these values
between -1 and 1, but third party training systems may not do so.
## Rewards
In reinforcement learning, the reward is a signal that the agent has done
Perhaps the best advice is to start simple and only add complexity as needed. In
general, you should reward results rather than actions you think will lead to
the desired results. To help develop your rewards, you can use the Monitor class
to display the cumulative reward received by an Agent. You can even use the
the desired results. You can even use the
Allocate rewards to an Agent by calling the `AddReward()` method in the
`OnActionReceived()` function. The reward assigned between each decision
Allocate rewards to an Agent by calling the `AddReward()` or `SetReward()` methods on the agent.
The reward assigned between each decision
decision was. There is a method called `SetReward()` that will override all
decision was. The `SetReward()` will override all
previous rewards given to an agent since the previous decision.
### Examples
Note that all of these environments make use of the `EndEpisode()` method, which manually
terminates an episode when a termination condition is reached. This can be
called independently of the `Max Step` property.
### Rewards Summary & Best Practices
* Use `AddReward()` to accumulate rewards between decisions. Use `SetReward()`
to overwrite any previous rewards accumulate between decisions.
* The magnitude of any given reward should typically not be greater than 1.0 in
order to ensure a more stable learning process.
* Positive rewards are often more helpful to shaping the desired behavior of an
agent than negative rewards. Excessive negative rewards can result in the agent
failing to learn any meaningful behavior.
* For locomotion tasks, a small positive reward (+0.1) for forward velocity is
typically used.
* If you want the agent to finish a task quickly, it is often helpful to provide
a small penalty every step (-0.05) that the agent does not complete the task.
In this case completion of the task should also coincide with the end of the
episode by calling `EndEpisode()` on the agent when it has accomplished its goal.
## Agent Properties