浏览代码

Hotfix docs odd (#3379)

* Updating version number (#3366)

* updating version number

* fixing version numbers

* migration guide (#3375)

* Reduce num steps for walljump (#3377)

* Fixing the Docs on On Demand Decision

Co-authored-by: Anupam Bhatnagar <anupambhatnagar@gmail.com>
Co-authored-by: Chris Elion <celion@gmail.com>
Co-authored-by: Ervin T. <ervin@unity3d.com>
/asymm-envs
GitHub 5 年前
当前提交
c1340b0e
共有 6 个文件被更改,包括 13 次插入68 次删除
  1. 1
      com.unity.ml-agents/Runtime/Academy.cs
  2. 4
      config/sac_trainer_config.yaml
  3. 4
      config/trainer_config.yaml
  4. 3
      docs/Getting-Started-with-Balance-Ball.md
  5. 60
      docs/Learning-Environment-Design-Agents.md
  6. 9
      docs/ML-Agents-Overview.md

1
com.unity.ml-agents/Runtime/Academy.cs


"docs/Learning-Environment-Design.md")]
public class Academy : IDisposable
{
const string k_ApiVersion = "API-15-dev0";
const int k_EditorTrainingPort = 5004;

4
config/sac_trainer_config.yaml


num_layers: 2
SmallWallJump:
max_steps: 3e7
max_steps: 5e6
hidden_units: 256
summary_freq: 20000
time_horizon: 128

BigWallJump:
max_steps: 3e7
max_steps: 2e7
hidden_units: 256
summary_freq: 20000
time_horizon: 128

4
config/trainer_config.yaml


num_layers: 2
SmallWallJump:
max_steps: 3e7
max_steps: 5e6
batch_size: 128
buffer_size: 2048
beta: 5.0e-3

normalize: false
BigWallJump:
max_steps: 3e7
max_steps: 2e7
batch_size: 128
buffer_size: 2048
beta: 5.0e-3

3
docs/Getting-Started-with-Balance-Ball.md


the next section.
* **Max Step** — Defines how many simulation steps can occur before the Agent
decides it is done. In 3D Balance Ball, an Agent restarts after 5000 steps.
* **Reset On Done** — Defines whether an Agent starts over when it is finished.
3D Balance Ball sets this true so that the Agent restarts after reaching the
**Max Step** count or after dropping the ball.
Perhaps the more interesting aspect of an agents is the Agent subclass
implementation. When you create an Agent, you must extend the base Agent class.

60
docs/Learning-Environment-Design-Agents.md


## Decisions
The observation-decision-action-reward cycle repeats after a configurable number
of simulation steps (the frequency defaults to once-per-step). You can also set
up an Agent to request decisions on demand. Making decisions at regular step
intervals is generally most appropriate for physics-based simulations. Making
decisions on demand is generally appropriate for situations where Agents only
respond to specific events or take actions of variable duration. For example, an
The observation-decision-action-reward cycle repeats each time the Agent request
a decision.
Agents will request a decision when `Agent.RequestDecision()` is called. If you need
the Agent to request decisions on its own at regular intervals, add a
`Decision Requester` component to the Agent's Game Object. Making decisions at regular step
intervals is generally most appropriate for physics-based simulations. For example, an
occur, should use on-demand decision making.
To control the frequency of step-based decision making, set the **Decision
Frequency** value for the Agent object in the Unity Inspector window. Agents
using the same Model can use a different frequency. During simulation
steps in which no decision is requested, the Agent receives the same action
chosen by the previous decision.
### On Demand Decision Making
On demand decision making allows Agents to request decisions from their Policies
only when needed instead of receiving decisions at a fixed frequency. This is
useful when the agents commit to an action for a variable number of steps or
when the agents cannot make decisions at the same time. This typically the case
for turn based games, games where agents must react to events or games where
agents can take actions of variable duration.
When you turn on **On Demand Decisions** for an Agent, your agent code must call
the `Agent.RequestDecision()` function. This function call starts one iteration
of the observation-decision-action-reward cycle. The Agent's
`CollectObservations()` method is called, the Policy makes a decision and
returns it by calling the
`AgentAction()` method. The Policy waits for the Agent to request the next
decision before starting another iteration.
occur, should call `Agent.RequestDecision()` manually.
## Observations

* `Use Heuristic` - If checked, the Agent will use its 'Heuristic()' method for
decisions.
* `Max Step` - The per-agent maximum number of steps. Once this number is
reached, the Agent will be reset if `Reset On Done` is checked.
* `Reset On Done` - Whether the Agent's `AgentReset()` function should be called
when the Agent reaches its `Max Step` count or is marked as done in code.
* `On Demand Decision` - Whether the Agent requests decisions at a fixed step
interval or explicitly requests decisions by calling `RequestDecision()`.
* If not checked, the Agent will request a new decision every `Decision
Frequency` steps and perform an action every step. In the example above,
`CollectObservations()` will be called every 5 steps and `AgentAction()`
will be called at every step. This means that the Agent will reuse the
decision the Policy has given it.
* If checked, the Agent controls when to receive decisions, and take actions.
To do so, the Agent may leverage one or two methods:
* `RequestDecision()` Signals that the Agent is requesting a decision. This
causes the Agent to collect its observations and ask the Policy for a
decision at the next step of the simulation. Note that when an Agent
requests a decision, it also request an action. This is to ensure that
all decisions lead to an action during training.
* `RequestAction()` Signals that the Agent is requesting an action. The
action provided to the Agent in this case is the same action that was
provided the last time it requested a decision.
* `Decision Interval` - The number of steps between decision requests. Not used
if `On Demand Decision`, is true.
reached, the Agent will be reset.
## Monitoring Agents

9
docs/ML-Agents-Overview.md


additional features which improve the flexibility and interpretability of the
training process.
- **On Demand Decision Making** - With the ML-Agents toolkit it is possible to
have agents request decisions only when needed as opposed to requesting
decisions at every step of the environment. This enables training of turn
based games, games where agents must react to events or games where agents can
take actions of variable duration. Switching between decision taking at every
step and on-demand-decision is one button click away. You can learn more about
the on-demand-decision feature
[here](Learning-Environment-Design-Agents.md#on-demand-decision-making).
- **Memory-enhanced Agents** - In some scenarios, agents must learn to remember
the past in order to take the best decision. When an agent only has partial
observability of the environment, keeping track of past observations can help

正在加载...
取消
保存