浏览代码

Update documentation appropriately

/develop-generalizationTraining-TrainerController
Arthur Juliani 6 年前
当前提交
ebd9dab4
共有 3 个文件被更改,包括 18 次插入36 次删除
  1. 25
      docs/Learning-Environment-Design-Agents.md
  2. 1
      docs/Learning-Environment-Design-Brains.md
  3. 28
      docs/Learning-Environment-Examples.md

25
docs/Learning-Environment-Design-Agents.md


## Observations
To make decisions, an agent must observe its environment to determine its current state. A state observation can take the following forms:
To make decisions, an agent must observe its environment in order to infer the state of the world. A state observation can take the following forms:
* **Continuous Vector** — a feature vector consisting of an array of numbers.
* **Discrete Vector** — an index into a state table (typically only useful for the simplest of environments).
* **Vector Observation** — a feature vector consisting of an array of floating point numbers.
When you use the **Continuous** or **Discrete** vector observation space for an agent, implement the `Agent.CollectObservations()` method to create the feature vector or state index. When you use **Visual Observations**, you only need to identify which Unity Camera objects will provide images and the base Agent class handles the rest. You do not need to implement the `CollectObservations()` method when your agent uses visual observations (unless it also uses vector observations).
When you use vector observations for an agent, implement the `Agent.CollectObservations()` method to create the feature vector. When you use **Visual Observations**, you only need to identify which Unity Camera objects will provide images and the base Agent class handles the rest. You do not need to implement the `CollectObservations()` method when your agent uses visual observations (unless it also uses vector observations).
### Continuous Vector Observation Space: Feature Vectors
### Vector Observation Space: Feature Vectors
For agents using a continuous state space, you create a feature vector to represent the agent's observation at each step of the simulation. The Brain class calls the `CollectObservations()` method of each of its agents. Your implementation of this function must call `AddVectorObs` to add vector observations.

When you set up an Agent's brain in the Unity Editor, set the following properties to use a continuous vector observation:
**Space Size** — The state size must match the length of your feature vector.
**Space Type** — Set to **Continuous**.
**Brain Type** — Set to **External** during training; set to **Internal** to use the trained model.
The observation feature vector is a list of floating point numbers, which means you must convert any other data types to a float or a list of floats.

In addition, make sure that the Agent's Brain expects a visual observation. In the Brain inspector, under **Brain Parameters** > **Visual Observations**, specify the number of Cameras the agent is using for its visual observations. For each visual observation, set the width and height of the image (in pixels) and whether or not the observation is color or grayscale (when `Black And White` is checked).
### Discrete Vector Observation Space: Table Lookup
You can use the discrete vector observation space when an agent only has a limited number of possible states and those states can be enumerated by a single number. For instance, the [Basic example environment](Learning-Environment-Examples.md#basic) in ML-Agents defines an agent with a discrete vector observation space. The states of this agent are the integer steps between two linear goals. In the Basic example, the agent learns to move to the goal that provides the greatest reward.
More generally, the discrete vector observation identifier could be an index into a table of the possible states. However, tables quickly become unwieldy as the environment becomes more complex. For example, even a simple game like [tic-tac-toe has 765 possible states](https://en.wikipedia.org/wiki/Game_complexity) (far more if you don't reduce the number of observations by combining those that are rotations or reflections of each other).
To implement a discrete state observation, implement the `CollectObservations()` method of your Agent subclass and return a `List` containing a single number representing the state:
```csharp
public override void CollectObservations()
{
AddVectorObs(stateIndex); // stateIndex is the state identifier
}
```
## Vector Actions
An action is an instruction from the brain that the agent carries out. The action is passed to the agent as a parameter when the Academy invokes the agent's `AgentAction()` function. When you specify that the vector action space is **Continuous**, the action parameter passed to the agent is an array of control signals with length equal to the `Vector Action Space Size` property. When you specify a **Discrete** vector action space type, the action parameter is an array containing only a single value, which is an index into your list or table of commands. In the **Discrete** vector action space type, the `Vector Action Space Size` is the number of elements in your action table. Set the `Vector Action Space Size` and `Vector Action Space Type` properties on the Brain object assigned to the agent (using the Unity Editor Inspector window).

1
docs/Learning-Environment-Design-Brains.md


* `Brain Parameters` - Define vector observations, visual observation, and vector actions for the Brain.
* `Vector Observation`
* `Space Type` - Corresponds to whether the observation vector contains a single integer (Discrete) or a series of real-valued floats (Continuous).
* `Space Size` - Length of vector observation for brain (In _Continuous_ space type). Or number of possible values (in _Discrete_ space type).
* `Stacked Vectors` - The number of previous vector observations that will be stacked and used collectively for decision making. This results in the effective size of the vector observation being passed to the brain being: _Space Size_ x _Stacked Vectors_.
* `Visual Observations` - Describes height, width, and whether to grayscale visual observations for the Brain.

28
docs/Learning-Environment-Examples.md


* +0.1 for arriving at suboptimal state.
* +1.0 for arriving at optimal state.
* Brains: One brain with the following observation/action space.
* Vector Observation space: (Discrete) One variable corresponding to current state.
* Vector Observation space: One variable corresponding to current state.
* Vector Action space: (Discrete) Two possible actions (Move left, move right).
* Visual Observations: None.
* Reset Parameters: None

* +0.1 for every step the ball remains on the platform.
* -1.0 if the ball falls from the platform.
* Brains: One brain with the following observation/action space.
* Vector Observation space: (Continuous) 8 variables corresponding to rotation of platform, and position, rotation, and velocity of ball.
* Vector Observation space (Hard Version): (Continuous) 5 variables corresponding to rotation of platform and position and rotation of ball.
* Vector Observation space: 8 variables corresponding to rotation of platform, and position, rotation, and velocity of ball.
* Vector Observation space (Hard Version): 5 variables corresponding to rotation of platform and position and rotation of ball.
* Vector Action space: (Continuous) Size of 2, with one value corresponding to X-rotation, and the other to Z-rotation.
* Visual Observations: None.
* Reset Parameters: None

* +0.1 To agent when hitting ball over net.
* -0.1 To agent who let ball hit their ground, or hit ball out of bounds.
* Brains: One brain with the following observation/action space.
* Vector Observation space: (Continuous) 8 variables corresponding to position and velocity of ball and racket.
* Vector Observation space: 8 variables corresponding to position and velocity of ball and racket.
* Vector Action space: (Continuous) Size of 2, corresponding to movement toward net or away from net, and jumping.
* Visual Observations: None.
* Reset Parameters: One, corresponding to size of ball.

* -0.0025 for every step.
* +1.0 if the block touches the goal.
* Brains: One brain with the following observation/action space.
* Vector Observation space: (Continuous) 15 variables corresponding to position and velocities of agent, block, and goal.
* Vector Observation space: 15 variables corresponding to position and velocities of agent, block, and goal.
* Vector Action space: (Continuous) Size of 2, corresponding to movement in X and Z directions.
* Visual Observations (Optional): One first-person camera. Use `VisualPushBlock` scene.
* Reset Parameters: None.

* +1.0 if the agent touches the goal.
* -1.0 if the agent falls off the platform.
* Brains: Two brains, each with the following observation/action space.
* Vector Observation space: (Continuous) 16 variables corresponding to position and velocities of agent, block, and goal, plus the height of the wall.
* Vector Observation space: 16 variables corresponding to position and velocities of agent, block, and goal, plus the height of the wall.
* Vector Action space: (Discrete) Size of 74, corresponding to 14 raycasts each detecting 4 possible objects. plus the global position of the agent and whether or not the agent is grounded.
* Visual Observations: None.
* Reset Parameters: 4, corresponding to the height of the possible walls.

* Agent Reward Function (independent):
* +0.1 Each step agent's hand is in goal location.
* Brains: One brain with the following observation/action space.
* Vector Observation space: (Continuous) 26 variables corresponding to position, rotation, velocity, and angular velocities of the two arm Rigidbodies.
* Vector Observation space: 26 variables corresponding to position, rotation, velocity, and angular velocities of the two arm Rigidbodies.
* Vector Action space: (Continuous) Size of 4, corresponding to torque applicable to two joints.
* Visual Observations: None.
* Reset Parameters: Two, corresponding to goal size, and goal movement speed.

* +0.03 times body velocity in the goal direction.
* +0.01 times body direction alignment with goal direction.
* Brains: One brain with the following observation/action space.
* Vector Observation space: (Continuous) 117 variables corresponding to position, rotation, velocity, and angular velocities of each limb plus the acceleration and angular acceleration of the body.
* Vector Observation space: 117 variables corresponding to position, rotation, velocity, and angular velocities of each limb plus the acceleration and angular acceleration of the body.
* Vector Action space: (Continuous) Size of 20, corresponding to target rotations for joints.
* Visual Observations: None.
* Reset Parameters: None

* +1 for interaction with yellow banana
* -1 for interaction with blue banana.
* Brains: One brain with the following observation/action space.
* Vector Observation space: (Continuous) 53 corresponding to velocity of agent (2), whether agent is frozen and/or shot its laser (2), plus ray-based perception of objects around agent's forward direction (49; 7 raycast angles with 7 measurements for each).
* Vector Observation space: 53 corresponding to velocity of agent (2), whether agent is frozen and/or shot its laser (2), plus ray-based perception of objects around agent's forward direction (49; 7 raycast angles with 7 measurements for each).
* Vector Action space: (Continuous) Size of 3, corresponding to forward movement, y-axis rotation, and whether to use laser to disable other agents.
* Visual Observations (Optional): First-person camera per-agent. Use `VisualBanana` scene.
* Reset Parameters: None.

* -0.1 For moving to incorrect goal.
* -0.0003 Existential penalty.
* Brains: One brain with the following observation/action space:
* Vector Observation space: (Continuous) 30 corresponding to local ray-casts detecting objects, goals, and walls.
* Vector Observation space: 30 corresponding to local ray-casts detecting objects, goals, and walls.
* Vector Action space: (Discrete) 4 corresponding to agent rotation and forward/backward movement.
* Visual Observations (Optional): First-person view for the agent. Use `VisualHallway` scene.
* Reset Parameters: None.

* -1 For bouncing out of bounds.
* -0.05 Times the action squared. Energy expenditure penalty.
* Brains: One brain with the following observation/action space:
* Vector Observation space: (Continuous) 6 corresponding to local position of agent and banana.
* Vector Observation space: 6 corresponding to local position of agent and banana.
* Vector Action space: (Continuous) 3 corresponding to agent force applied for the jump.
* Visual Observations: None.
* Reset Parameters: None.

* +0.1 When ball enters opponents goal.
* +0.001 Existential bonus.
* Brains: Two brain with the following observation/action space:
* Vector Observation space: (Continuous) 112 corresponding to local 14 ray casts, each detecting 7 possible object types, along with the object's distance. Perception is in 180 degree view from front of agent.
* Vector Observation space: 112 corresponding to local 14 ray casts, each detecting 7 possible object types, along with the object's distance. Perception is in 180 degree view from front of agent.
* Vector Action space: (Discrete)
* Striker: 6 corresponding to forward, backward, sideways movement, as well as rotation.
* Goalie: 4 corresponding to forward, backward, sideways movement.

* +0.01 times body direction alignment with goal direction.
* -0.01 times head velocity difference from body velocity.
* Brains: One brain with the following observation/action space.
* Vector Observation space: (Continuous) 215 variables corresponding to position, rotation, velocity, and angular velocities of each limb, along with goal direction.
* Vector Observation space: 215 variables corresponding to position, rotation, velocity, and angular velocities of each limb, along with goal direction.
* Vector Action space: (Continuous) Size of 39, corresponding to target rotations applicable to the joints.
* Visual Observations: None.
* Reset Parameters: None.

* Agent Reward Function (independent):
* +2 For moving to golden brick (minus 0.001 per step).
* Brains: One brain with the following observation/action space:
* Vector Observation space: (Continuous) 148 corresponding to local ray-casts detecting switch, bricks, golden brick, and walls, plus variable indicating switch state.
* Vector Observation space: 148 corresponding to local ray-casts detecting switch, bricks, golden brick, and walls, plus variable indicating switch state.
* Vector Action space: (Discrete) 4 corresponding to agent rotation and forward/backward movement.
* Visual Observations (Optional): First-person camera per-agent. Use `VisualPyramids` scene.
* Reset Parameters: None.
正在加载...
取消
保存