Hotfix docs odd (#3379)

* Updating version number (#3366) * updating version number * fixing version numbers * migration guide (#3375) * Reduce num steps for walljump (#3377) * Fixing the Docs on On Demand Decision Co-authored-by: Anupam Bhatnagar <anupambhatnagar@gmail.com> Co-authored-by: Chris Elion <celion@gmail.com> Co-authored-by: Ervin T. <ervin@unity3d.com>
5 年前 · c1340b0e
--- a/com.unity.ml-agents/Runtime/Academy.cs
+++ b/com.unity.ml-agents/Runtime/Academy.cs
        "docs/Learning-Environment-Design.md")]
    public class Academy : IDisposable
    {
+
        const string k_ApiVersion = "API-15-dev0";
        const int k_EditorTrainingPort = 5004;

--- a/config/sac_trainer_config.yaml
+++ b/config/sac_trainer_config.yaml
    num_layers: 2

 SmallWallJump:
-    max_steps: 3e7
+    max_steps: 5e6
    hidden_units: 256
    summary_freq: 20000
    time_horizon: 128

 BigWallJump:
-    max_steps: 3e7
+    max_steps: 2e7
    hidden_units: 256
    summary_freq: 20000
    time_horizon: 128
--- a/config/trainer_config.yaml
+++ b/config/trainer_config.yaml
    num_layers: 2

 SmallWallJump:
-    max_steps: 3e7
+    max_steps: 5e6
    batch_size: 128
    buffer_size: 2048
    beta: 5.0e-3
    normalize: false

 BigWallJump:
-    max_steps: 3e7
+    max_steps: 2e7
    batch_size: 128
    buffer_size: 2048
    beta: 5.0e-3
--- a/docs/Getting-Started-with-Balance-Ball.md
+++ b/docs/Getting-Started-with-Balance-Ball.md
  the next section.
 * **Max Step** — Defines how many simulation steps can occur before the Agent
  decides it is done. In 3D Balance Ball, an Agent restarts after 5000 steps.
-* **Reset On Done** — Defines whether an Agent starts over when it is finished.
-  3D Balance Ball sets this true so that the Agent restarts after reaching the
-  **Max Step** count or after dropping the ball.

 Perhaps the more interesting aspect of an agents is the Agent subclass
 implementation. When you create an Agent, you must extend the base Agent class.
--- a/docs/Learning-Environment-Design-Agents.md
+++ b/docs/Learning-Environment-Design-Agents.md

 ## Decisions

-The observation-decision-action-reward cycle repeats after a configurable number
-of simulation steps (the frequency defaults to once-per-step). You can also set
-up an Agent to request decisions on demand. Making decisions at regular step
-intervals is generally most appropriate for physics-based simulations. Making
-decisions on demand is generally appropriate for situations where Agents only
-respond to specific events or take actions of variable duration. For example, an
+The observation-decision-action-reward cycle repeats each time the Agent request
+a decision.
+Agents will request a decision when `Agent.RequestDecision()` is called. If you need
+the Agent to request decisions on its own at regular intervals, add a
+`Decision Requester` component to the Agent's Game Object. Making decisions at regular step
+intervals is generally most appropriate for physics-based simulations. For example, an
-occur, should use on-demand decision making.
-
-To control the frequency of step-based decision making, set the **Decision
-Frequency** value for the Agent object in the Unity Inspector window. Agents
-using the same Model can use a different frequency. During simulation
-steps in which no decision is requested, the Agent receives the same action
-chosen by the previous decision.
-
-### On Demand Decision Making
-
-On demand decision making allows Agents to request decisions from their Policies
-only when needed instead of receiving decisions at a fixed frequency. This is
-useful when the agents commit to an action for a variable number of steps or
-when the agents cannot make decisions at the same time. This typically the case
-for turn based games, games where agents must react to events or games where
-agents can take actions of variable duration.
-
-When you turn on **On Demand Decisions** for an Agent, your agent code must call
-the `Agent.RequestDecision()` function. This function call starts one iteration
-of the observation-decision-action-reward cycle. The Agent's
-`CollectObservations()` method is called, the Policy makes a decision and
-returns it by calling the
-`AgentAction()` method. The Policy waits for the Agent to request the next
-decision before starting another iteration.
+occur, should call `Agent.RequestDecision()` manually.

 ## Observations

  * `Use Heuristic` - If checked, the Agent will use its 'Heuristic()' method for
  decisions.
 * `Max Step` - The per-agent maximum number of steps. Once this number is
-  reached, the Agent will be reset if `Reset On Done` is checked.
-* `Reset On Done` - Whether the Agent's `AgentReset()` function should be called
-  when the Agent reaches its `Max Step` count or is marked as done in code.
-* `On Demand Decision` - Whether the Agent requests decisions at a fixed step
-  interval or explicitly requests decisions by calling `RequestDecision()`.
-  * If not checked, the Agent will request a new decision every `Decision
-     Frequency` steps and perform an action every step. In the example above,
-     `CollectObservations()` will be called every 5 steps and `AgentAction()`
-     will be called at every step. This means that the Agent will reuse the
-     decision the Policy has given it.
-  * If checked, the Agent controls when to receive decisions, and take actions.
-     To do so, the Agent may leverage one or two methods:
-    * `RequestDecision()` Signals that the Agent is requesting a decision. This
-        causes the Agent to collect its observations and ask the Policy for a
-        decision at the next step of the simulation. Note that when an Agent
-        requests a decision, it also request an action. This is to ensure that
-        all decisions lead to an action during training.
-    * `RequestAction()` Signals that the Agent is requesting an action. The
-        action provided to the Agent in this case is the same action that was
-        provided the last time it requested a decision.
-* `Decision Interval` - The number of steps between decision requests. Not used
-  if `On Demand Decision`, is true.
+  reached, the Agent will be reset.

 ## Monitoring Agents

--- a/docs/ML-Agents-Overview.md
+++ b/docs/ML-Agents-Overview.md
 additional features which improve the flexibility and interpretability of the
 training process.

- **On Demand Decision Making** - With the ML-Agents toolkit it is possible to
-  have agents request decisions only when needed as opposed to requesting
-  decisions at every step of the environment. This enables training of turn
-  based games, games where agents must react to events or games where agents can
-  take actions of variable duration. Switching between decision taking at every
-  step and on-demand-decision is one button click away. You can learn more about
-  the on-demand-decision feature
-  [here](Learning-Environment-Design-Agents.md#on-demand-decision-making).
-
 - **Memory-enhanced Agents** - In some scenarios, agents must learn to remember
  the past in order to take the best decision. When an agent only has partial
  observability of the environment, keeping track of past observations can help