浏览代码

Review fixes; added decision section to Agents doc.

/develop-generalizationTraining-TrainerController
Joe Ward 7 年前
当前提交
36a95b8e
共有 4 个文件被更改,包括 35 次插入25 次删除
  1. 4
      docs/Feature-On-Demand-Decision.md
  2. 6
      docs/Getting-Started-with-Balance-Ball.md
  3. 38
      docs/Learning-Environment-Design-Agents.md
  4. 12
      docs/Learning-Environment-Design.md

4
docs/Feature-On-Demand-Decision.md


request a new decision every `Decision Frequency` steps and
perform an action every step. In the example above,
`CollectObservations()` will be called every 5 steps and
`AgentAct()` will be called at every step. This means that the
`AgentAction()` will be called at every step. This means that the
agent will reuse the decision the brain has given it.
* If `On Demand Decision` is checked, you are in charge of telling

* `RequestAction()` Call this method to signal the agent that
it must reuse its previous action at the next step of the
simulation. The Agent will not ask the brain for a new decision,
it will just call `AgentAct()` with the same action.
it will just call `AgentAction()` with the same action.

6
docs/Getting-Started-with-Balance-Ball.md


* Academy.InitializeAcademy() — Called once when the environment is launched.
* Academy.AcademyStep() — Called at every simulation step before
Agent.AgentAct() (and after the agents collect their observations).
Agent.AgentAction() (and after the agents collect their observations).
* Academy.AcademyReset() — Called when the Academy starts or restarts the
simulation (including the first time).

instance assigned to the agent is set to the continuous vector observation
space with a state size of 8, the `CollectObservations()` must call
`AddVectorObs` 8 times.
* Agent.AgentAct() — Called every simulation step. Receives the action chosen
* Agent.AgentAction() — Called every simulation step. Receives the action chosen
small change in platform rotation at each step. The `AgentAct()` function
small change in platform rotation at each step. The `AgentAction()` function
assigns a reward to the agent; in this example, an agent receives a small
positive reward for each step it keeps the ball on the platform and a larger,
negative reward for dropping the ball. An agent is also marked as done when it

38
docs/Learning-Environment-Design-Agents.md


An agent is an actor that can observe its environment and decide on the best course of action using those observations. Create agents in Unity by extending the Agent class. The most important aspects of creating agents that can successfully learn are the observations the agent collects and, for reinforcement learning, the reward you assign to estimate the value of the agent's current state toward accomplishing its tasks.
In the ML-Agents framework, an agent passes its observations to its brain at each simulation step. The brain, then, makes a decision and passes the chosen action back to the agent. The agent code executes the action, for example, it moves the agent in one direction or another. In order to train an agent using [reinforcement learning](Learning-Environment-Design.md), the agent must calculate a reward value at each action. The reward is used to discover the optimal decision-making policy. (A reward is not used by already trained agents.)
An agent passes its observations to its brain. The brain, then, makes a decision and passes the chosen action back to the agent. Your agent code must execute the action, for example, move the agent in one direction or another. In order to train an agent using [reinforcement learning](Learning-Environment-Design.md), your agent must calculate a reward value at each action. The reward is used to discover the optimal decision-making policy. (A reward is not used by already trained agents or for imitation learning.)
## Decisions
The observation-decision-action-reward cycle repeats after a configurable number of simulation steps (the frequency defaults to once-per-step). You can also set up an agent to request decisions on demand. Making decisions at regular step intervals is generally most appropriate for physics-based simulations. Making decisions on demand is generally appropriate for situations where agents only respond to specific events or take actions of variable duration. For example, an agent in a robotic simulator that must provide fine-control of joint torques should make its decisions every step of the simulation. On the other hand, an agent that only needs to make decisions when certain game or simulation events occur, should use on-demand decision making.
To control the frequency of step-based decision making, set the **Decision Frequency** value for the Agent object in the Unity Inspector window. Agents using the same Brain instance can use a different frequency. During simulation steps in which no decision is requested, the agent receives the same action chosen by the previous decision.
When you turn on **On Demand Decisions** for an agent, your agent code must call the `Agent.RequestDecision()` function. This function call starts one iteration of the observation-decision-action-reward cycle. The Brain invokes the agent's `CollectObservations()` method, makes a decision and returns it by calling the `AgentAction()` method. The Brain waits for the agent to request the next decision before starting another iteration.
See [On Demand Decision Making](Feature-On-Demand-Decision.md).
* **Continuous** — a feature vector consisting of an array of numbers.
* **Discrete** — an index into a state table (typically only useful for the simplest of environments).
* **Camera** — one or more camera images.
* **Continuous Vector** — a feature vector consisting of an array of numbers.
* **Discrete Vector** — an index into a state table (typically only useful for the simplest of environments).
* **Visual Observations** — one or more camera images.
When you use the **Continuous** or **Discrete** vector observation space for an agent, implement the `Agent.CollectObservations()` method to create the feature vector or state index. When you use camera observations, you only need to identify which Unity Camera objects will provide images and the base Agent class handles the rest. You do not need to implement the `CollectObservations()` method.
When you use the **Continuous** or **Discrete** vector observation space for an agent, implement the `Agent.CollectObservations()` method to create the feature vector or state index. When you use **Visual Observations**, you only need to identify which Unity Camera objects will provide images and the base Agent class handles the rest. You do not need to implement the `CollectObservations()` method when your agent uses visual observations (unless it also uses vector observations).
### Continuous Vector Observation Space: Feature Vectors

## Vector Actions
An action is an instruction from the brain that the agent carries out. The action is passed to the agent as a parameter when the Academy invokes the agent's `AgentAct()` function. When you specify that the vector action space is **Continuous**, the action parameter passed to the agent is an array of control signals with length equal to the `Vector Action Space Size` property. When you specify a **Discrete** vector action space type, the action parameter is an array containing only a single value, which is an index into your list or table of commands. In the **Discrete** vector action space type, the `Vector Action Space Size` is the number of elements in your action table. Set the `Vector Action Space Size` and `Vector Action Space Type` properties on the Brain object assigned to the agent (using the Unity Editor Inspector window).
An action is an instruction from the brain that the agent carries out. The action is passed to the agent as a parameter when the Academy invokes the agent's `AgentAction()` function. When you specify that the vector action space is **Continuous**, the action parameter passed to the agent is an array of control signals with length equal to the `Vector Action Space Size` property. When you specify a **Discrete** vector action space type, the action parameter is an array containing only a single value, which is an index into your list or table of commands. In the **Discrete** vector action space type, the `Vector Action Space Size` is the number of elements in your action table. Set the `Vector Action Space Size` and `Vector Action Space Type` properties on the Brain object assigned to the agent (using the Unity Editor Inspector window).
Neither the Brain nor the training algorithm know anything about what the action values themselves mean. The training algorithm simply tries different values for the action list and observes the affect on the accumulated rewards over time and many training episodes. Thus, the only place actions are defined for an agent is in the `AgentAct()` function. You simply specify the type of vector action space, and, for the continuous vector action space, the number of values, and then apply the received values appropriately (and consistently) in `ActionAct()`.
Neither the Brain nor the training algorithm know anything about what the action values themselves mean. The training algorithm simply tries different values for the action list and observes the affect on the accumulated rewards over time and many training episodes. Thus, the only place actions are defined for an agent is in the `AgentAction()` function. You simply specify the type of vector action space, and, for the continuous vector action space, the number of values, and then apply the received values appropriately (and consistently) in `ActionAct()`.
For example, if you designed an agent to move in two dimensions, you could use either continuous or the discrete vector actions. In the continuous case, you would set the vector action size to two (one for each dimension), and the agent's brain would create an action with two floating point values. In the discrete case, you would set the vector action size to four (one for each direction), and the brain would create an action array containing a single element with a value ranging from zero to four.

### Continuous Action Space
When an agent uses a brain set to the **Continuous** vector action space, the action parameter passed to the agent's `AgentAct()` function is an array with length equal to the Brain object's `Vector Action Space Size` property value. The individual values in the array have whatever meanings that you ascribe to them. If you assign an element in the array as the speed of an agent, for example, the training process learns to control the speed of the agent though this parameter.
When an agent uses a brain set to the **Continuous** vector action space, the action parameter passed to the agent's `AgentAction()` function is an array with length equal to the Brain object's `Vector Action Space Size` property value. The individual values in the array have whatever meanings that you ascribe to them. If you assign an element in the array as the speed of an agent, for example, the training process learns to control the speed of the agent though this parameter.
The [Reacher example](Learning-Environment-Examples.md) defines a continuous action space with four control values.

public override void AgentAct(float[] act)
public override void AgentAction(float[] act)
{
float torque_x = Mathf.Clamp(act[0], -1, 1) * 100f;
float torque_z = Mathf.Clamp(act[1], -1, 1) * 100f;

### Discrete Action Space
When an agent uses a brain set to the **Discrete** vector action space, the action parameter passed to the agent's `AgentAct()` function is an array containing a single element. The value is the index of the action to in your table or list of actions. With the discrete vector action space, `Vector Action Space Size` represents the number of actions in your action table.
When an agent uses a brain set to the **Discrete** vector action space, the action parameter passed to the agent's `AgentAction()` function is an array containing a single element. The value is the index of the action to in your table or list of actions. With the discrete vector action space, `Vector Action Space Size` represents the number of actions in your action table.
The [Area example](Learning-Environment-Examples.md) defines five actions for the discrete vector action space: a jump action and one action for each cardinal direction:

Perhaps the best advice is to start simple and only add complexity as needed. In general, you should reward results rather than actions you think will lead to the desired results. To help develop your rewards, you can use the Monitor class to display the cumulative reward received by an agent. You can even use a Player brain to control the agent while watching how it accumulates rewards.
Allocate rewards to an agent by calling the `AddReward()` method in the `AgentAct()` function. The reward assigned in any step should be in the range [-1,1]. Values outside this range can lead to unstable training. The `reward` value is reset to zero at every step.
Allocate rewards to an agent by calling the `AddReward()` method in the `AgentAction()` function. The reward assigned in any step should be in the range [-1,1]. Values outside this range can lead to unstable training. The `reward` value is reset to zero at every step.
You can examine the `AgentAct()` functions defined in the [Examples](Learning-Environment-Examples.md) to see how those projects allocate rewards.
You can examine the `AgentAction()` functions defined in the [Examples](Learning-Environment-Examples.md) to see how those projects allocate rewards.
The `GridAgent` class in the [GridWorld example](Learning-Environment-Examples.md) uses a very simple reward system:

* `Visual Observations` - A list of `Cameras` which will be used to generate observations.
* `Max Step` - The per-agent maximum number of steps. Once this number is reached, the agent will be reset if `Reset On Done` is checked.
* `Reset On Done` - Whether the agent's `AgentReset()` function should be called when the agent reaches its `Max Step` count or is marked as done in code.
* `On Demand Decision` - Whether the agent will request decision at a fixed frequency or if he will be manually have to request decisions with `RequestDecision()`
* `Decision Frequency` - If the agent is not `On Demand Decision`, this is the number of steps between decision requests.
* `On Demand Decision` - Whether the agent requests decisions at a fixed step interval or explicitly requests decisions by calling `RequestDecision()`.
* `Decision Frequency` - The number of steps between decision requests. Not used if `On Demand Decision`, is true.
## Instantiating an Agent at Runtime

12
docs/Learning-Environment-Design.md


3. Calls the `CollectObservations()` function for each agent in the scene.
4. Uses each agent's Brain class to decide on the agent's next action.
5. Calls your subclass's `AcademyAct()` function.
6. Calls the `AgentAct()` function for each agent in the scene, passing in the action chosen by the agent's brain. (This function is not called if the agent is done.)
6. Calls the `AgentAction()` function for each agent in the scene, passing in the action chosen by the agent's brain. (This function is not called if the agent is done.)
To create a training environment, extend the Academy and Agent classes to implement the above methods. The `Agent.CollectObservations()` and `Agent.AgentAct()` functions are required; the other methods are optional — whether you need to implement them or not depends on your specific scenario.
To create a training environment, extend the Academy and Agent classes to implement the above methods. The `Agent.CollectObservations()` and `Agent.AgentAction()` functions are required; the other methods are optional — whether you need to implement them or not depends on your specific scenario.
**Note:** The API used by the Python PPO training process to communicate with and control the Academy during training can be used for other purposes as well. For example, you could use the API to use Unity as the simulation engine for your own machine learning algorithms. See [External ML API](Python-API.md) for more information.

* `InitializeAcademy()` — Prepare the environment the first time it launches.
* `AcademyReset()` — Prepare the environment and agents for the next training episode. Use this function to place and initialize entities in the scene as necessary.
* `AcademyStep()` — Prepare the environment for the next simulation step. The base Academy class calls this function before calling any `AgentAct()` methods for the current step. You can use this function to update other objects in the scene before the agents take their actions. Note that the agents have already collected their observations and chosen an action before the Academy invokes this method.
* `AcademyStep()` — Prepare the environment for the next simulation step. The base Academy class calls this function before calling any `AgentAction()` methods for the current step. You can use this function to update other objects in the scene before the agents take their actions. Note that the agents have already collected their observations and chosen an action before the Academy invokes this method.
The base Academy classes also defines several important properties that you can set in the Unity Editor Inspector. For training, the most important of these properties is `Max Steps`, which determines how long each training episode lasts. Once the Academy's step counter reaches this value, it calls the `AcademyReset()` function to start the next episode.

The Agent class represents an actor in the scene that collects observations and carries out actions. The Agent class is typically attached to the GameObject in the scene that otherwise represents the actor — for example, to a player object in a football game or a car object in a vehicle simulation. Every Agent must be assigned a Brain.
To create an agent, extend the Agent class and implement the essential `CollectObservations()` and `AgentAct()` methods:
To create an agent, extend the Agent class and implement the essential `CollectObservations()` and `AgentAction()` methods:
* `AgentAct()` — Carries out the action chosen by the agent's brain and assigns a reward to the current state.
* `AgentAction()` — Carries out the action chosen by the agent's brain and assigns a reward to the current state.
You must also determine how an Agent finishes its task or times out. You can manually set an agent to done in your `AgentAct()` function when the agent has finished (or irrevocably failed) its task. You can also set the agent's `Max Steps` property to a positive value and the agent will consider itself done after it has taken that many steps. When the Academy reaches its own `Max Steps` count, it starts the next episode. If you set an agent's `ResetOnDone` property to true, then the agent can attempt its task several times in one episode. (Use the `Agent.AgentReset()` function to prepare the agent to start again.)
You must also determine how an Agent finishes its task or times out. You can manually set an agent to done in your `AgentAction()` function when the agent has finished (or irrevocably failed) its task. You can also set the agent's `Max Steps` property to a positive value and the agent will consider itself done after it has taken that many steps. When the Academy reaches its own `Max Steps` count, it starts the next episode. If you set an agent's `ResetOnDone` property to true, then the agent can attempt its task several times in one episode. (Use the `Agent.AgentReset()` function to prepare the agent to start again.)
See [Agents](Learning-Environment-Design-Agents.md) for detailed information about programing your own agents.

正在加载...
取消
保存