# Agents
An agent is an actor that can observe its environment and decide on the best
course of action using those observations. Create a gents in Unity by extending
course of action using those observations. Create A gents in Unity by extending
successfully learn are the observations the agent collects and, for
reinforcement learning, the reward you assign to estimate the value of the
successfully learn are the observations the agent collects for
reinforcement learning and the reward you assign to estimate the value of the
An agent passes its observations to its brain. The b rain, then, makes a decision
An Agent passes its observations to its Brain. The B rain, then, makes a decision
and passes the chosen action back to the agent. Your agent code must execute the
action, for example, move the agent in one direction or another. In order to
[train an agent using reinforcement learning ](Learning-Environment-Design.md ),
The Brain class abstracts out the decision making logic from the a gent itself so
that you can use the same brain in multiple agents. How a b rain makes its
decisions depends on the type of brain it is. An **External** b rain simply
passes the observations from its a gents to an external process and then passes
the decisions made externally back to the agents. An **Internal** b rain uses the
The Brain class abstracts out the decision making logic from the A gent itself so
that you can use the same Brain in multiple Agents. How a B rain makes its
decisions depends on the type of Brain it is. An **External** B rain simply
passes the observations from its A gents to an external process and then passes
the decisions made externally back to the Agents. An **Internal** B rain uses the
parameters in search of a better decision). The other types of b rains do not
parameters in search of a better decision). The other types of B rains do not
directly involve training, but you might find them useful as part of a training
project. See [Brains ](Learning-Environment-Design-Brains.md ).
of simulation steps (the frequency defaults to once-per-step). You can also set
up an a gent to request decisions on demand. Making decisions at regular step
up an A gent to request decisions on demand. Making decisions at regular step
decisions on demand is generally appropriate for situations where a gents only
decisions on demand is generally appropriate for situations where A gents only
respond to specific events or take actions of variable duration. For example, an
agent in a robotic simulator that must provide fine-control of joint torques
should make its decisions every step of the simulation. On the other hand, an
To control the frequency of step-based decision making, set the **Decision
Frequency** value for the Agent object in the Unity Inspector window. Agents
using the same Brain instance can use a different frequency. During simulation
steps in which no decision is requested, the a gent receives the same action
steps in which no decision is requested, the A gent receives the same action
On demand decision making allows agents to request decisions from their b rains
On demand decision making allows Agents to request decisions from their B rains
only when needed instead of receiving decisions at a fixed frequency. This is
useful when the agents commit to an action for a variable number of steps or
when the agents cannot make decisions at the same time. This typically the case
When you turn on **On Demand Decisions** for an a gent, your agent code must call
When you turn on **On Demand Decisions** for an A gent, your agent code must call
of the observation-decision-action-reward cycle. The Brain invokes the a gent's
of the observation-decision-action-reward cycle. The Brain invokes the A gent's
`AgentAction()` method. The Brain waits for the a gent to request the next
`AgentAction()` method. The Brain waits for the A gent to request the next
decision before starting another iteration.
## Observations
point numbers.
* **Visual Observations** — one or more camera images.
When you use vector observations for an a gent, implement the
When you use vector observations for an A gent, implement the
to implement the `CollectObservations()` method when your a gent uses visual
to implement the `CollectObservations()` method when your A gent uses visual
observations (unless it also uses vector observations).
### Vector Observation Space: Feature Vectors
class calls the `CollectObservations()` method of each of its a gents. Your
class calls the `CollectObservations()` method of each of its A gents. Your
The observation must include all the information an agent needs to accomplish
The observation must include all the information an agents needs to accomplish
its task. Without sufficient and relevant information, an agent may learn poorly
or may not learn at all. A reasonable approach for determining what information
should be included is to consider what you would need to calculate an analytical
an agent's observations to a fixed subset. For example, instead of observing
every enemy agent in an environment, you could only observe the closest five.
When you set up an Agent's b rain in the Unity Editor, set the following
When you set up an Agent's B rain in the Unity Editor, set the following
properties to use a continuous vector observation:
* **Space Size** — The state size must match the length of your feature vector.
### Multiple Visual Observations
Camera observations use rendered textures from one or more cameras in a scene.
The b rain vectorizes the textures into a 3D Tensor which can be fed into a
The B rain vectorizes the textures into a 3D Tensor which can be fed into a
convolutional neural network (CNN). For more information on CNNs, see [this
guide](http://cs231n.github.io/convolutional-networks/). You can use camera
observations along side vector observations.
also typically less efficient and slower to train, and sometimes don't succeed
at all.
To add a visual observation to an a gent, click on the `Add Camera` button in the
To add a visual observation to an A gent, click on the `Add Camera` button in the
can have more than one camera attached to an a gent.
can have more than one camera attached to an A gent.
specify the number of Cameras the a gent is using for its visual observations.
specify the number of Cameras the A gent is using for its visual observations.
For each visual observation, set the width and height of the image (in pixels)
and whether or not the observation is color or grayscale (when `Black And White`
is checked).
An action is an instruction from the b rain that the agent carries out. The
action is passed to the a gent as a parameter when the Academy invokes the
An action is an instruction from the B rain that the agent carries out. The
action is passed to the A gent as a parameter when the Academy invokes the
is **Continuous** , the action parameter passed to the a gent is an array of
is **Continuous** , the action parameter passed to the A gent is an array of
control signals with length equal to the `Vector Action Space Size` property.
When you specify a **Discrete** vector action space type, the action parameter
is an array containing integers. Each integer is an index into a list or table
corresponds to an action table, you can specify the size of each table by
modifying the `Branches` property. Set the `Vector Action Space Size` and
`Vector Action Space Type` properties on the Brain object assigned to the a gent
`Vector Action Space Type` properties on the Brain object assigned to the A gent
many training episodes. Thus, the only place actions are defined for an a gent is
many training episodes. Thus, the only place actions are defined for an A gent is
in the `AgentAction()` function. You simply specify the type of vector action
space, and, for the continuous vector action space, the number of values, and
then apply the received values appropriately (and consistently) in
either continuous or the discrete vector actions. In the continuous case, you
would set the vector action size to two (one for each dimension), and the
agent's b rain would create an action with two floating point values. In the
agent's B rain would create an action with two floating point values. In the
direction), and the b rain would create an action array containing a single
direction), and the B rain would create an action array containing a single
movement), and the b rain would create an action array containing two elements
movement), and the B rain would create an action array containing two elements
test your action logic using a **Player** b rain, which lets you map keyboard
test your action logic using a **Player** B rain, which lets you map keyboard
commands to actions. See [Brains ](Learning-Environment-Design-Brains.md ).
The [3DBall ](Learning-Environment-Examples.md#3dball-3d-balance-ball ) and
### Continuous Action Space
When an agent uses a b rain set to the **Continuous** vector action space, the
action parameter passed to the a gent's `AgentAction()` function is an array with
When an Agent uses a B rain set to the **Continuous** vector action space, the
action parameter passed to the A gent's `AgentAction()` function is an array with
them. If you assign an element in the array as the speed of an a gent, for
example, the training process learns to control the speed of the a gent though
them. If you assign an element in the array as the speed of an A gent, for
example, the training process learns to control the speed of the A gent though
this parameter.
The [Reacher example ](Learning-Environment-Examples.md#reacher ) defines a
### Discrete Action Space
When an agent uses a b rain set to the **Discrete** vector action space, the
action parameter passed to the a gent's `AgentAction()` function is an array
When an Agent uses a B rain set to the **Discrete** vector action space, the
action parameter passed to the A gent's `AgentAction()` function is an array
For example, if we wanted an a gent that can move in an plane and jump, we could
For example, if we wanted an A gent that can move in an plane and jump, we could
agent be able to move __and__ jump concurently. We define the first branch to
agent be able to move __and__ jump concurr ently. We define the first branch to
have 5 possible actions (don't move, go left, go right, go backward, go forward)
and the second one to have 2 possible actions (don't jump, jump). The
AgentAction method would look something like:
// Look up the index in the jump action list:
if (jump == 1 & & IsGrounded()) { directionY = 1; }
// Apply the action results to move the a gent
// Apply the action results to move the A gent
gameObject.GetComponent< Rigidbody > ().AddForce(
new Vector3(
directionX * 40f, directionY * 300f, directionZ * 40f));
#### Masking Discrete Actions
When using Discrete Actions, it is possible to specify that some actions are
impossible for the next decision. Then the a gent is controlled by an External or
Internal Brain, the a gent will be unable to perform the specified action. Note
that when the agent is controlled by a Player or Heuristic Brain, the a gent will
impossible for the next decision. Then the A gent is controlled by an External or
Internal Brain, the A gent will be unable to perform the specified action. Note
that when the Agent is controlled by a Player or Heuristic Brain, the A gent will
still be able to decide to perform the masked action. In order to mask an
action, call the method `SetActionMask` within the `CollectObservation` method :
* `branch` is the index (starting at 0) of the branch on which you want to mask
the action
* `actionIndices` is a list of `int` or a single `int` corresponding to the
index of theaction that the a gent cannot perform.
index of the action that the A gent cannot perform.
For example, if you have an a gent with 2 branches and on the first branch
For example, if you have an A gent with 2 branches and on the first branch
and _"change weapon"_ . Then with the code bellow, the a gent will either _"do
and _"change weapon"_ . Then with the code bellow, the A gent will either _"do
nothing"_ or _"change weapon"_ for his next decision (since action index 1 and 2
are masked)
reward over time. The better your reward mechanism, the better your agent will
learn.
**Note:** Rewards are not used during inference by a b rain using an already
**Note:** Rewards are not used during inference by a B rain using an already
to display the cumulative reward received by an a gent. You can even use a Player
brain to control the a gent while watching how it accumulates rewards.
to display the cumulative reward received by an A gent. You can even use a Player
Brain to control the A gent while watching how it accumulates rewards.
Allocate rewards to an a gent by calling the `AddReward()` method in the
Allocate rewards to an A gent by calling the `AddReward()` method in the
`AgentAction()` function. The reward assigned in any step should be in the range
[-1,1]. Values outside this range can lead to unstable training. The `reward`
value is reset to zero at every step.
SetReward(0.1f);
}
// When ball falls mark a gent as done and give a negative penalty
// When ball falls mark A gent as done and give a negative penalty
if ((ball.transform.position.y - gameObject.transform.position.y) < -2f | |
Mathf.Abs(ball.transform.position.x - gameObject.transform.position.x) > 3f ||
Mathf.Abs(ball.transform.position.z - gameObject.transform.position.z) > 3f)
![Agent Inspector ](images/agent.png )
* `Brain` - The brain to register this a gent to. Can be dragged into the
* `Brain` - The Brain to register this A gent to. Can be dragged into the
reached, the a gent will be reset if `Reset On Done` is checked.
* `Reset On Done` - Whether the a gent's `AgentReset()` function should be called
when the a gent reaches its `Max Step` count or is marked as done in code.
* `On Demand Decision` - Whether the a gent requests decisions at a fixed step
reached, the A gent will be reset if `Reset On Done` is checked.
* `Reset On Done` - Whether the A gent's `AgentReset()` function should be called
when the A gent reaches its `Max Step` count or is marked as done in code.
* `On Demand Decision` - Whether the A gent requests decisions at a fixed step
interval or explicitly requests decisions by calling `RequestDecision()` .
* If not checked, the Agent will request a new decision every `Decision
Frequency` steps and perform an action every step. In the example above,
* `RequestAction()` Signals that the Agent is requesting an action. The
action provided to the Agent in this case is the same action that was
provided the last time it requested a decision.
* `Decision Frequency` - The number of steps between decision requests. Not used if `On Demand Decision` , is true.
* `Decision Frequency` - The number of steps between decision requests. Not used
if `On Demand Decision` , is true.
Unity environment. While this was built for monitoring an Agent's value function
Unity environment. While this was built for monitoring an agent's value function
throughout the training process, we imagine it can be more broadly useful. You
can learn more [here ](Feature-Monitor.md ).
`GameObject.Instantiate()` function. It is typically easiest to instantiate an
agent from a [Prefab ](https://docs.unity3d.com/Manual/Prefabs.html ) (otherwise,
you have to instantiate every GameObject and Component that make up your a gent
you have to instantiate every GameObject and Component that make up your A gent
following function creates a new a gent given a Prefab, Brain instance, location,
following function creates a new A gent given a Prefab, Brain instance, location,
private void CreateAgent(GameObject a gentPrefab, Brain brain, Vector3 position, Quaternion orientation)
private void CreateAgent(GameObject A gentPrefab, Brain brain, Vector3 position, Quaternion orientation)
GameObject a gentObj = Instantiate(agentPrefab, position, orientation);
Agent agent = a gentObj.GetComponent< Agent > ();
a gent.GiveBrain(brain);
a gent.AgentReset();
GameObject A gentObj = Instantiate(agentPrefab, position, orientation);
Agent Agent = A gentObj.GetComponent< Agent > ();
A gent.GiveBrain(brain);
A gent.AgentReset();
}
```
the next step in the simulation) so that the Brain knows that this agent is no
longer active. Thus, the best place to destroy an agent is in the
the next step in the simulation) so that the Brain knows that this Agent is no
longer active. Thus, the best place to destroy an Agent is in the
`Agent.AgentOnDone()` function:
```csharp
}
```
Note that in order for `AgentOnDone()` to be called, the a gent's `ResetOnDone`
property must be false. You can set `ResetOnDone` on the a gent's Inspector or in
Note that in order for `AgentOnDone()` to be called, the A gent's `ResetOnDone`
property must be false. You can set `ResetOnDone` on the A gent's Inspector or in
code.