* Improvements to Getting Started guide
- Changed the ordered list to use "1."
- Trimmed down text
- Removed references to Agent APIs
* Incorporating feedback
* Prettier formatting
window. The Inspector shows every component on a GameObject.
The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several agent cubes. Each agent cube in the scene is an
independent agent, but they all share the same Behavior. 3D Balance Ball does this
to speed up training since all twelve agents contribute to training in parallel.
it contains not one, but several agent cubes. Each agent cube in the scene is an
independent agent, but they all share the same Behavior. 3D Balance Ball does
this to speed up training since all twelve agents contribute to training in
parallel.
### Agent
behavior:
* **Behavior Parameters** — Every Agent must have a Behavior. The Behavior
determines how an Agent makes decisions. More on Behavior Parameters in
the next section.
* **Max Step** — Defines how many simulation steps can occur before the Agent's
- **Behavior Parameters** — Every Agent must have a Behavior. The Behavior
determines how an Agent makes decisions.
- **Max Step** — Defines how many simulation steps can occur before the Agent's
When you create an Agent, you must extend the base Agent class.
The Ball3DAgent subclass defines the following methods:
* `Agent.OnEpisodeBegin()` — Called at the beginning of an Agent's episode, including at the beginning
of the simulation. The Ball3DAgent class uses this function to reset the
agent cube and ball to their starting positions. The function randomizes the reset values so that the
training generalizes to more than a specific starting position and agent cube
attitude.
* `Agent.CollectObservations(VectorSensor sensor)` — Called every simulation step. Responsible for
collecting the Agent's observations of the environment. Since the Behavior
Parameters of the Agent are set with vector observation
space with a state size of 8, the `CollectObservations(VectorSensor sensor)` must call
`VectorSensor.AddObservation()` such that vector size adds up to 8.
* `Agent.OnActionReceived()` — Called every time the Agent receives an action to take. Receives the action chosen
by the Agent. The vector action spaces result in a
small change in the agent cube's rotation at each step. The `OnActionReceived()` method
assigns a reward to the Agent; in this example, an Agent receives a small
positive reward for each step it keeps the ball on the agent cube's head and a larger,
negative reward for dropping the ball. An Agent's episode is also ended when it
drops the ball so that it will reset with a new ball for the next simulation
step.
* `Agent.Heuristic()` - When the `Behavior Type` is set to `Heuristic Only` in the Behavior
Parameters of the Agent, the Agent will use the `Heuristic()` method to generate
the actions of the Agent. As such, the `Heuristic()` method takes an array of
floats. In the case of the Ball 3D Agent, the `Heuristic()` method converts the
keyboard inputs into actions.
#### Behavior Parameters : Vector Observation Space
Before making a decision, an agent collects its observation about its state in
The Behavior Parameters of the 3D Balance Ball example uses a **Space Size** of 8.
This means that the feature
vector containing the Agent's observations contains eight elements: the `x` and
`z` components of the agent cube's rotation and the `x`, `y`, and `z` components
of the ball's relative position and velocity. (The observation values are
defined in the Agent's `CollectObservations(VectorSensor sensor)` method.)
The Behavior Parameters of the 3D Balance Ball example uses a `Space Size` of 8.
This means that the feature vector containing the Agent's observations contains
eight elements: the `x` and `z` components of the agent cube's rotation and the
`x`, `y`, and `z` components of the ball's relative position and velocity.
An Agent is given instructions in the form of a float array of *actions*.
ML-Agents toolkit classifies actions into two types: the **Continuous** vector
action space is a vector of numbers that can vary continuously. What each
element of the vector means is defined by the Agent logic (the training
process just learns what values are better given particular state observations
based on the rewards received when it tries different values). For example, an
element might represent a force or torque applied to a `Rigidbody` in the Agent.
The **Discrete** action vector space defines its actions as tables. An action
given to the Agent is an array of indices into tables.
The 3D Balance Ball example is programmed to use continuous action
space with `Space Size` of 2.
An Agent is given instructions in the form of a float array of _actions_.
ML-Agents Toolkit classifies actions into two types: continuous and discrete.
The 3D Balance Ball example is programmed to use continuous action space which
is a a vector of numbers that can vary continuously. More specifically, it uses
a `Space Size` of 2 to control the amount of `x` and `z` rotations to apply to
itself to keep the ball balanced on its head.
[Unity Inference Engine](Unity-Inference-Engine.md) to run these models
inside Unity. In this section, we will use the pre-trained model for the
3D Ball example.
[Unity Inference Engine](Unity-Inference-Engine.md) to run these models inside
Unity. In this section, we will use the pre-trained model for the 3D Ball
example.
1. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Scenes` folder
and open the `3DBall` scene file.
2. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Prefabs` folder.
Expand `3DBall` and click on the `Agent` prefab. You should see the `Agent` prefab in the **Inspector** window.
1. In the **Project** window, go to the
`Assets/ML-Agents/Examples/3DBall/Prefabs` folder. Expand `3DBall` and click
on the `Agent` prefab. You should see the `Agent` prefab in the **Inspector**
window.
**Note**: The platforms in the `3DBall` scene were created using the `3DBall` prefab. Instead of updating all 12 platforms individually, you can update the `3DBall` prefab instead.
**Note**: The platforms in the `3DBall` scene were created using the `3DBall`
prefab. Instead of updating all 12 platforms individually, you can update the
`3DBall` prefab instead.
3. In the **Project** window, drag the **3DBall** Model located in
`Assets/ML-Agents/Examples/3DBall/TFModels` into the `Model` property under `Behavior Parameters (Script)` component in the Agent GameObject **Inspector** window.
1. In the **Project** window, drag the **3DBall** Model located in
`Assets/ML-Agents/Examples/3DBall/TFModels` into the `Model` property under
`Behavior Parameters (Script)` component in the Agent GameObject
**Inspector** window.
4. You should notice that each `Agent` under each `3DBall` in the **Hierarchy** windows now contains **3DBall** as `Model` on the `Behavior Parameters`. __Note__ : You can modify multiple game objects in a scene by selecting them all at
once using the search bar in the Scene Hierarchy.
8. Select the **InferenceDevice** to use for this model (CPU or GPU) on the Agent.
_Note: CPU is faster for the majority of ML-Agents toolkit generated models_
9. Click the **Play** button and you will see the platforms balance the balls
using the pre-trained model.
1. You should notice that each `Agent` under each `3DBall` in the **Hierarchy**
windows now contains **3DBall** as `Model` on the `Behavior Parameters`.
**Note** : You can modify multiple game objects in a scene by selecting them
all at once using the search bar in the Scene Hierarchy.
1. Set the **Inference Device** to use for this model as `CPU`.
1. Click the :arrow_forward: button in the Unity Editor and you will see the
platforms balance the balls using the pre-trained model.
While we provide pre-trained `.nn` files for the agents in this environment, any environment you make yourself will require training agents from scratch to generate a new model file. We can do this using reinforcement learning.
In order to train an agent to correctly balance the ball, we provide two
deep reinforcement learning algorithms.
The default algorithm is Proximal Policy Optimization (PPO). This
is a method that has been shown to be more general purpose and stable
than many other RL algorithms. For more information on PPO, OpenAI
has a [blog post](https://blog.openai.com/openai-baselines-ppo/)
explaining it, and [our page](Training-PPO.md) for how to use it in training.
We also provide Soft-Actor Critic, an off-policy algorithm that
has been shown to be both stable and sample-efficient.
For more information on SAC, see UC Berkeley's
[blog post](https://bair.berkeley.edu/blog/2018/12/14/sac/) and
[our page](Training-SAC.md) for more guidance on when to use SAC vs. PPO. To
use SAC to train Balance Ball, replace all references to `config/trainer_config.yaml`
with `config/sac_trainer_config.yaml` below.
To train the agents within the Balance Ball environment, we will be using the
ML-Agents Python package. We have provided a convenient command called `mlagents-learn`
which accepts arguments used to configure both training and inference phases.
While we provide pre-trained `.nn` files for the agents in this environment, any
environment you make yourself will require training agents from scratch to
generate a new model file. In this section we will demonstrate how to use the
reinforcement learning algorithms that are part of the ML-Agents Python package
to accomplish this. We have provided a convenient command `mlagents-learn` which
accepts arguments used to configure both training and inference phases.
2. Navigate to the folder where you cloned the ML-Agents toolkit repository.
**Note**: If you followed the default [installation](Installation.md), then
you should be able to run `mlagents-learn` from any directory.
3. Run `mlagents-learn <trainer-config-path> --run-id=<run-identifier>`
where:
- `<trainer-config-path>` is the relative or absolute filepath of the
trainer configuration. The defaults used by example environments included
in `MLAgentsSDK` can be found in `config/trainer_config.yaml`.
- `<run-identifier>` is a string used to separate the results of different
training runs. Make sure to use one that hasn't been used already!
4. If you cloned the ML-Agents repo, then you can simply run
:warning: For vectors, you should apply the above formula to each component (x, y, and z). Note that this is *not* the same as using the `Vector3.normalized` property or `Vector3.Normalize()` method in Unity (and similar for `Vector2`).
:warning: For vectors, you should apply the above formula to each component (x,
y, and z). Note that this is _not_ the same as using the `Vector3.normalized`
property or `Vector3.Normalize()` method in Unity (and similar for `Vector2`).
Rotations and angles should also be normalized. For angles between 0 and 360
degrees, you can use the following formulas:
#### Vector Observation Summary & Best Practices
* Vector Observations should include all variables relevant for allowing the
agent to take the optimally informed decision, and ideally no extraneous information.
* In cases where Vector Observations need to be remembered or compared over
time, either an LSTM (see [here](Feature-Memory.md)) should be used in the model, or the
`Stacked Vectors` value in the agent GameObject's `Behavior Parameters` should be changed.
* Categorical variables such as type of object (Sword, Shield, Bow) should be
encoded in one-hot fashion (i.e. `3` -> `0, 0, 1`). This can be done automatically using the
`AddOneHotObservation()` method of the `VectorSensor`.
* In general, all inputs should be normalized to be in
the range 0 to +1 (or -1 to 1). For example, the `x` position information of
an agent where the maximum possible value is `maxValue` should be recorded as
- Vector Observations should include all variables relevant for allowing the
agent to take the optimally informed decision, and ideally no extraneous
information.
- In cases where Vector Observations need to be remembered or compared over
time, either an LSTM (see [here](Feature-Memory.md)) should be used in the
model, or the `Stacked Vectors` value in the agent GameObject's
`Behavior Parameters` should be changed.
- Categorical variables such as type of object (Sword, Shield, Bow) should be
encoded in one-hot fashion (i.e. `3` -> `0, 0, 1`). This can be done
automatically using the `AddOneHotObservation()` method of the `VectorSensor`.
- In general, all inputs should be normalized to be in the range 0 to +1 (or -1
to 1). For example, the `x` position information of an agent where the maximum
possible value is `maxValue` should be recorded as
* Positional information of relevant GameObjects should be encoded in relative
- Positional information of relevant GameObjects should be encoded in relative
Visual observations are generally provided to agent via either a `CameraSensor` or `RenderTextureSensor`.
These collect image information and transforms it into a 3D Tensor which
can be fed into the convolutional neural network (CNN) of the agent policy. For more information on
CNNs, see [this guide](http://cs231n.github.io/convolutional-networks/). This allows agents
to learn from spatial regularities in the observation images. It is possible to
use visual and vector observations with the same agent.
Visual observations are generally provided to agent via either a `CameraSensor`
or `RenderTextureSensor`. These collect image information and transforms it into
a 3D Tensor which can be fed into the convolutional neural network (CNN) of the
agent policy. For more information on CNNs, see
[this guide](http://cs231n.github.io/convolutional-networks/). This allows
agents to learn from spatial regularities in the observation images. It is
possible to use visual and vector observations with the same agent.
used when it is not possible to properly define the problem using vector or ray-cast observations.
used when it is not possible to properly define the problem using vector or
ray-cast observations.
Visual observations can be derived from Cameras or RenderTextures within your scene.
To add a visual observation to an Agent, add either a Camera Sensor Component
or RenderTextures Sensor Component to the Agent. Then drag the camera or
render texture you want to add to the `Camera` or `RenderTexture` field.
You can have more than one camera or render texture and even use a combination
of both attached to an Agent. For each visual observation, set the width and height
of the image (in pixels) and whether or not the observation is color or grayscale.
Visual observations can be derived from Cameras or RenderTextures within your
scene. To add a visual observation to an Agent, add either a Camera Sensor
Component or RenderTextures Sensor Component to the Agent. Then drag the camera
or render texture you want to add to the `Camera` or `RenderTexture` field. You
can have more than one camera or render texture and even use a combination of
both attached to an Agent. For each visual observation, set the width and height
of the image (in pixels) and whether or not the observation is color or
grayscale.
![Agent Camera](images/visual-observation.png)
Each Agent that uses the same Policy must have the same number of visual observations,
and they must all have the same resolutions (including whether or not they are grayscale).
Additionally, each Sensor Component on an Agent must have a unique name so that they can
be sorted deterministically (the name must be unique for that Agent, but multiple Agents can
have a Sensor Component with the same name).
Each Agent that uses the same Policy must have the same number of visual
observations, and they must all have the same resolutions (including whether or
not they are grayscale). Additionally, each Sensor Component on an Agent must
have a unique name so that they can be sorted deterministically (the name must
be unique for that Agent, but multiple Agents can have a Sensor Component with
the same name).
adding a `Canvas`, then adding a `Raw Image` with it's texture set to the Agent's
`RenderTexture`. This will render the agent observation on the game screen.
adding a `Canvas`, then adding a `Raw Image` with it's texture set to the
Agent's `RenderTexture`. This will render the agent observation on the game
screen.
The [GridWorld environment](Learning-Environment-Examples.md#gridworld)
is an example on how to use a RenderTexture for both debugging and observation. Note
that in this example, a Camera is rendered to a RenderTexture, which is then used for
observations and debugging. To update the RenderTexture, the Camera must be asked to
render every time a decision is requested within the game code. When using Cameras
as observations directly, this is done automatically by the Agent.
The [GridWorld environment](Learning-Environment-Examples.md#gridworld) is an
example on how to use a RenderTexture for both debugging and observation. Note
that in this example, a Camera is rendered to a RenderTexture, which is then
used for observations and debugging. To update the RenderTexture, the Camera
must be asked to render every time a decision is requested within the game code.
When using Cameras as observations directly, this is done automatically by the
Agent.
* To collect visual observations, attach `CameraSensor` or `RenderTextureSensor`
- To collect visual observations, attach `CameraSensor` or `RenderTextureSensor`
* Visual observations should generally be used unless vector observations are not sufficient.
* Image size should be kept as small as possible, without the loss of
needed details for decision making.
* Images should be made greyscale in situations where color information is
not needed for making informed decisions.
- Visual observations should generally be used unless vector observations are
not sufficient.
- Image size should be kept as small as possible, without the loss of needed
details for decision making.
- Images should be made greyscale in situations where color information is not
needed for making informed decisions.
This can be easily implemented by adding a
`RayPerceptionSensorComponent3D` (or `RayPerceptionSensorComponent2D`) to the Agent GameObject.
This can be easily implemented by adding a `RayPerceptionSensorComponent3D` (or
`RayPerceptionSensorComponent2D`) to the Agent GameObject.
During observations, several rays (or spheres, depending on settings) are cast into
the physics world, and the objects that are hit determine the observation vector that
is produced.
During observations, several rays (or spheres, depending on settings) are cast
into the physics world, and the objects that are hit determine the observation
vector that is produced.
* _Detectable Tags_ A list of strings corresponding to the types of objects that the
Agent should be able to distinguish between. For example, in the WallJump example,
we use "wall", "goal", and "block" as the list of objects to detect.
* _Rays Per Direction_ Determines the number of rays that are cast. One ray is
- _Detectable Tags_ A list of strings corresponding to the types of objects that
the Agent should be able to distinguish between. For example, in the WallJump
example, we use "wall", "goal", and "block" as the list of objects to detect.
- _Rays Per Direction_ Determines the number of rays that are cast. One ray is
* _Max Ray Degrees_ The angle (in degrees) for the outermost rays. 90 degrees
- _Max Ray Degrees_ The angle (in degrees) for the outermost rays. 90 degrees
* _Sphere Cast Radius_ The size of the sphere used for sphere casting. If set
to 0, rays will be used instead of spheres. Rays may be more efficient,
- _Sphere Cast Radius_ The size of the sphere used for sphere casting. If set to
0, rays will be used instead of spheres. Rays may be more efficient,
* _Ray Length_ The length of the casts
* _Observation Stacks_ The number of previous results to "stack" with the cast
results. Note that this can be independent of the "Stacked Vectors" setting
in `Behavior Parameters`.
* _Start Vertical Offset_ (3D only) The vertical offset of the ray start point.
* _End Vertical Offset_ (3D only) The vertical offset of the ray end point.
- _Ray Length_ The length of the casts
- _Observation Stacks_ The number of previous results to "stack" with the cast
results. Note that this can be independent of the "Stacked Vectors" setting in
`Behavior Parameters`.
- _Start Vertical Offset_ (3D only) The vertical offset of the ray start point.
- _End Vertical Offset_ (3D only) The vertical offset of the ray end point.
Both use 3 Rays Per Direction and 90 Max Ray Degrees. One of the components
had a vertical offset, so the Agent can tell whether it's clear to jump over
the wall.
Both use 3 Rays Per Direction and 90 Max Ray Degrees. One of the components had
a vertical offset, so the Agent can tell whether it's clear to jump over the
wall.
so the number of rays and tags should be kept as small as possible to reduce the
amount of data used. Note that this is separate from the State Size defined in
`Behavior Parameters`, so you don't need to worry about the formula above when
* Attach `RayPerceptionSensorComponent3D` or `RayPerceptionSensorComponent2D` to use.
* This observation type is best used when there is relevant spatial information
- Attach `RayPerceptionSensorComponent3D` or `RayPerceptionSensorComponent2D` to
use.
- This observation type is best used when there is relevant spatial information
* Use as few rays and tags as necessary to solve the problem in order to improve learning stability and agent performance.
- Use as few rays and tags as necessary to solve the problem in order to improve
learning stability and agent performance.
agent's `OnActionReceived()` function. Actions for an agent can take one of two forms, either **Continuous** or **Discrete**.
agent's `OnActionReceived()` function. Actions for an agent can take one of two
forms, either **Continuous** or **Discrete**.
When you specify that the vector action space
is **Continuous**, the action parameter passed to the Agent is an array of
floating point numbers with length equal to the `Vector Action Space Size` property.
When you specify a **Discrete** vector action space type, the action parameter
is an array containing integers. Each integer is an index into a list or table
of commands. In the **Discrete** vector action space type, the action parameter
is an array of indices. The number of indices in the array is determined by the
number of branches defined in the `Branches Size` property. Each branch
corresponds to an action table, you can specify the size of each table by
modifying the `Branches` property.
When you specify that the vector action space is **Continuous**, the action
parameter passed to the Agent is an array of floating point numbers with length
equal to the `Vector Action Space Size` property. When you specify a
**Discrete** vector action space type, the action parameter is an array
containing integers. Each integer is an index into a list or table of commands.
In the **Discrete** vector action space type, the action parameter is an array
of indices. The number of indices in the array is determined by the number of
branches defined in the `Branches Size` property. Each branch corresponds to an
action table, you can specify the size of each table by modifying the `Branches`
property.
Neither the Policy nor the training algorithm know anything about what the action
values themselves mean. The training algorithm simply tries different values for
the action list and observes the affect on the accumulated rewards over time and
many training episodes. Thus, the only place actions are defined for an Agent is
in the `OnActionReceived()` function.
Neither the Policy nor the training algorithm know anything about what the
action values themselves mean. The training algorithm simply tries different
values for the action list and observes the affect on the accumulated rewards
over time and many training episodes. Thus, the only place actions are defined
for an Agent is in the `OnActionReceived()` function.
For example, if you designed an agent to move in two dimensions, you could use
either continuous or the discrete vector actions. In the continuous case, you
with values ranging from zero to one.
Note that when you are programming actions for an agent, it is often helpful to
test your action logic using the `Heuristic()` method of the Agent,
which lets you map keyboard commands to actions.
test your action logic using the `Heuristic()` method of the Agent, which lets
you map keyboard commands to actions.
The [3DBall](Learning-Environment-Examples.md#3dball-3d-balance-ball) and
[Area](Learning-Environment-Examples.md#push-block) example environments are set
When an Agent uses a Policy set to the **Continuous** vector action space, the
action parameter passed to the Agent's `OnActionReceived()` function is an array with
length equal to the `Vector Action Space Size` property value.
The individual values in the array have whatever meanings that you ascribe to
them. If you assign an element in the array as the speed of an Agent, for
example, the training process learns to control the speed of the Agent through
this parameter.
action parameter passed to the Agent's `OnActionReceived()` function is an array
with length equal to the `Vector Action Space Size` property value. The
individual values in the array have whatever meanings that you ascribe to them.
If you assign an element in the array as the speed of an Agent, for example, the
training process learns to control the speed of the Agent through this
parameter.
The [Reacher example](Learning-Environment-Examples.md#reacher) defines a
continuous action space with four control values.
### Discrete Action Space
When an Agent uses a **Discrete** vector action space, the
action parameter passed to the Agent's `OnActionReceived()` function is an array
containing indices. With the discrete vector action space, `Branches` is an
array of integers, each value corresponds to the number of possibilities for
each branch.
When an Agent uses a **Discrete** vector action space, the action parameter
passed to the Agent's `OnActionReceived()` function is an array containing
indices. With the discrete vector action space, `Branches` is an array of
integers, each value corresponds to the number of possibilities for each branch.
agent be able to move __and__ jump concurrently. We define the first branch to
agent be able to move **and** jump concurrently. We define the first branch to
have 5 possible actions (don't move, go left, go right, go backward, go forward)
and the second one to have 2 possible actions (don't jump, jump). The
`OnActionReceived()` method would look something like:
#### Masking Discrete Actions
When using Discrete Actions, it is possible to specify that some actions are
impossible for the next decision. When the Agent is controlled by a
neural network, the Agent will be unable to perform the specified action. Note
that when the Agent is controlled by its Heuristic, the Agent will
still be able to decide to perform the masked action. In order to mask an
action, override the `Agent.CollectDiscreteActionMasks()` virtual method,
and call `DiscreteActionMasker.SetMask()` in it:
impossible for the next decision. When the Agent is controlled by a neural
network, the Agent will be unable to perform the specified action. Note that
when the Agent is controlled by its Heuristic, the Agent will still be able to
decide to perform the masked action. In order to mask an action, override the
`Agent.CollectDiscreteActionMasks()` virtual method, and call
`DiscreteActionMasker.SetMask()` in it:
```csharp
public override void CollectDiscreteActionMasks(DiscreteActionMasker actionMasker){
Where:
*`branch` is the index (starting at 0) of the branch on which you want to mask
-`branch` is the index (starting at 0) of the branch on which you want to mask
*`actionIndices` is a list of `int` corresponding to the
indices of the actions that the Agent cannot perform.
-`actionIndices` is a list of `int` corresponding to the indices of the actions
that the Agent cannot perform.
For example, if you have an Agent with 2 branches and on the first branch
(branch 0) there are 4 possible actions : _"do nothing"_, _"jump"_, _"shoot"_
Notes:
* You can call `SetMask` multiple times if you want to put masks on
multiple branches.
* You cannot mask all the actions of a branch.
* You cannot mask actions in continuous control.
- You can call `SetMask` multiple times if you want to put masks on multiple
branches.
- You cannot mask all the actions of a branch.
- You cannot mask actions in continuous control.
### Actions Summary &Best Practices
### Actions Summary & Best Practices
* Actions can either use `Discrete` or `Continuous` spaces.
* When using `Discrete` it is possible to assign multiple action branches, and to mask certain actions.
* In general, smaller action spaces will make for easier learning.
* Be sure to set the Vector Action's Space Size to the number of used Vector
- Actions can either use `Discrete` or `Continuous` spaces.
- When using `Discrete` it is possible to assign multiple action branches, and
to mask certain actions.
- In general, smaller action spaces will make for easier learning.
- Be sure to set the Vector Action's Space Size to the number of used Vector
* When using continuous control, action values should be clipped to an
- When using continuous control, action values should be clipped to an
## Rewards
reward over time. The better your reward mechanism, the better your agent will
learn.
**Note:** Rewards are not used during inference by an Agent using a
trained model and is also not used during imitation learning.
**Note:** Rewards are not used during inference by an Agent using a trained
model and is also not used during imitation learning.
the desired results. You can even use the
Agent's Heuristic to control the Agent while watching how it accumulates rewards.
the desired results. You can even use the Agent's Heuristic to control the Agent
while watching how it accumulates rewards.
Allocate rewards to an Agent by calling the `AddReward()` or `SetReward()` methods on the agent.
The reward assigned between each decision
should be in the range [-1,1]. Values outside this range can lead to
unstable training. The `reward` value is reset to zero when the agent receives a
new decision. If there are multiple calls to `AddReward()` for a single agent
decision, the rewards will be summed together to evaluate how good the previous
decision was. The `SetReward()` will override all
previous rewards given to an agent since the previous decision.
Allocate rewards to an Agent by calling the `AddReward()` or `SetReward()`
methods on the agent. The reward assigned between each decision should be in the
range [-1,1]. Values outside this range can lead to unstable training. The
`reward` value is reset to zero when the agent receives a new decision. If there
are multiple calls to `AddReward()` for a single agent decision, the rewards
will be summed together to evaluate how good the previous decision was. The
`SetReward()` will override all previous rewards given to an agent since the
previous decision.
You can examine the `OnActionReceived()` functions defined in the [example
environments](Learning-Environment-Examples.md) to see how those projects
allocate rewards.
You can examine the `OnActionReceived()` functions defined in the
[example environments](Learning-Environment-Examples.md) to see how those
projects allocate rewards.
The `GridAgent` class in the [GridWorld
example](Learning-Environment-Examples.md#gridworld) uses a very simple reward
system:
The `GridAgent` class in the
[GridWorld example](Learning-Environment-Examples.md#gridworld) uses a very