|
|
|
|
|
|
|
|
|
|
## Decisions |
|
|
|
|
|
|
|
The observation-decision-action-reward cycle repeats after a configurable number |
|
|
|
of simulation steps (the frequency defaults to once-per-step). You can also set |
|
|
|
up an Agent to request decisions on demand. Making decisions at regular step |
|
|
|
intervals is generally most appropriate for physics-based simulations. Making |
|
|
|
decisions on demand is generally appropriate for situations where Agents only |
|
|
|
respond to specific events or take actions of variable duration. For example, an |
|
|
|
The observation-decision-action-reward cycle repeats each time the Agent request |
|
|
|
a decision. |
|
|
|
Agents will request a decision when `Agent.RequestDecision()` is called. If you need |
|
|
|
the Agent to request decisions on its own at regular intervals, add a |
|
|
|
`Decision Requester` component to the Agent's Game Object. Making decisions at regular step |
|
|
|
intervals is generally most appropriate for physics-based simulations. For example, an |
|
|
|
occur, should use on-demand decision making. |
|
|
|
|
|
|
|
To control the frequency of step-based decision making, set the **Decision |
|
|
|
Frequency** value for the Agent object in the Unity Inspector window. Agents |
|
|
|
using the same Model can use a different frequency. During simulation |
|
|
|
steps in which no decision is requested, the Agent receives the same action |
|
|
|
chosen by the previous decision. |
|
|
|
|
|
|
|
### On Demand Decision Making |
|
|
|
|
|
|
|
On demand decision making allows Agents to request decisions from their Policies |
|
|
|
only when needed instead of receiving decisions at a fixed frequency. This is |
|
|
|
useful when the agents commit to an action for a variable number of steps or |
|
|
|
when the agents cannot make decisions at the same time. This typically the case |
|
|
|
for turn based games, games where agents must react to events or games where |
|
|
|
agents can take actions of variable duration. |
|
|
|
|
|
|
|
When you turn on **On Demand Decisions** for an Agent, your agent code must call |
|
|
|
the `Agent.RequestDecision()` function. This function call starts one iteration |
|
|
|
of the observation-decision-action-reward cycle. The Agent's |
|
|
|
`CollectObservations()` method is called, the Policy makes a decision and |
|
|
|
returns it by calling the |
|
|
|
`AgentAction()` method. The Policy waits for the Agent to request the next |
|
|
|
decision before starting another iteration. |
|
|
|
occur, should call `Agent.RequestDecision()` manually. |
|
|
|
|
|
|
|
## Observations |
|
|
|
|
|
|
|
|
|
|
* `Use Heuristic` - If checked, the Agent will use its 'Heuristic()' method for |
|
|
|
decisions. |
|
|
|
* `Max Step` - The per-agent maximum number of steps. Once this number is |
|
|
|
reached, the Agent will be reset if `Reset On Done` is checked. |
|
|
|
* `Reset On Done` - Whether the Agent's `AgentReset()` function should be called |
|
|
|
when the Agent reaches its `Max Step` count or is marked as done in code. |
|
|
|
* `On Demand Decision` - Whether the Agent requests decisions at a fixed step |
|
|
|
interval or explicitly requests decisions by calling `RequestDecision()`. |
|
|
|
* If not checked, the Agent will request a new decision every `Decision |
|
|
|
Frequency` steps and perform an action every step. In the example above, |
|
|
|
`CollectObservations()` will be called every 5 steps and `AgentAction()` |
|
|
|
will be called at every step. This means that the Agent will reuse the |
|
|
|
decision the Policy has given it. |
|
|
|
* If checked, the Agent controls when to receive decisions, and take actions. |
|
|
|
To do so, the Agent may leverage one or two methods: |
|
|
|
* `RequestDecision()` Signals that the Agent is requesting a decision. This |
|
|
|
causes the Agent to collect its observations and ask the Policy for a |
|
|
|
decision at the next step of the simulation. Note that when an Agent |
|
|
|
requests a decision, it also request an action. This is to ensure that |
|
|
|
all decisions lead to an action during training. |
|
|
|
* `RequestAction()` Signals that the Agent is requesting an action. The |
|
|
|
action provided to the Agent in this case is the same action that was |
|
|
|
provided the last time it requested a decision. |
|
|
|
* `Decision Interval` - The number of steps between decision requests. Not used |
|
|
|
if `On Demand Decision`, is true. |
|
|
|
reached, the Agent will be reset. |
|
|
|
|
|
|
|
## Monitoring Agents |
|
|
|
|
|
|
|