浏览代码

adding documentation

/bullet-hell-barracuda-test-1.3.1
vincentpierre 3 年前
当前提交
3ae1675c
共有 3 个文件被更改,包括 1073 次插入0 次删除
  1. 45
      docs/Learning-Environment-Design-Agents.md
  2. 27
      docs/Learning-Environment-Examples.md
  3. 1001
      docs/images/sorter.png

45
docs/Learning-Environment-Design-Agents.md


- [Visual Observation Summary & Best Practices](#visual-observation-summary--best-practices)
- [Raycast Observations](#raycast-observations)
- [RayCast Observation Summary & Best Practices](#raycast-observation-summary--best-practices)
- [Variable Length Observations](#variable-length-observations)
- [Variable Length Observation Summary & Best Practices](#variable-length-observation-summary--best-practices)
- [Actions and Actuators](#actions-and-actuators)
- [Continuous Actions](#continuous-actions)
- [Discrete Actions](#discrete-actions)

for the agent that doesn't require a fully rendered image to convey.
- Use as few rays and tags as necessary to solve the problem in order to improve
learning stability and agent performance.
### Variable Length Observations
It is possible to have observations be of variable size by using a `BufferSensor`,
You can add a `BufferSensor` to your Agent by adding a `BufferSensorComponent` to
its GameObject.
The `BufferSensor` can be useful in situations in which the Agent must pay
attention to a varying number of entities. On the trainer side, the `BufferSensor`
is processed using an attention module, more information about attention
mechanisms can be found [here](https://arxiv.org/abs/1706.03762). Training or
doing inference with variable length observations can be slower than using
a flat vector observation, but can allow you to represent more complex problems
such as our [Sorter environmentt](Learning-Environment-Examples.md#sorter).
Note that even though the `BufferSensor` can process a variable number of
entities, you still need to define a maximum number of entities. This is
because our network architecture requires to know what the shape of the
observations will be. If less entities are processed than the maximum, the
observation will be padded with zeros, but the trainer will actually ignore
the padding.
The `BufferSensor` constructor and Editor inspector have two arguments:
- `Observation Size` : This is how many floats each entities will be
represented with. This number is fixed and all entities must
have the same representation. For example, if the entities you want to
put into the `BufferSensor` have for relevant information position and
speed, then the `Observation Size` should be 6 floats.
- `Maximum Number of Entities` : This is the maximum number of entities
the `BufferSensor` will be able to collect.
To add an entity's observations to the `BufferSensor`, you need to call the
`BufferSensor.AppendObservation()` (or `BufferSensorComponent.AppendObservation()`)
with a float array of size `Observation Size` as argument.
__Note__: Currently, the observations put into the `Buffer Sensor` are
not normalized, you will need to normalize your observations manually
between -1 and 1.
#### Variable Length Observation Summary & Best Practices
- Attach `BufferSensorComponent` to use
- Call `BufferSensor.AppendObservation()` to add the observations of an
entity to the `BufferSensor`
- Normalize the entities observations before feeding them into the `BufferSensor`
## Actions and Actuators

27
docs/Learning-Environment-Examples.md


- 37.6 for vector observations
- 34.2 for simple heuristic (pick a random valid move)
- 37.0 for greedy heuristic (pick the highest-scoring valid move)
## Sorter
![Sorter](images/sorter.png)
- Set-up: The Agent is in a circular room with numbered tiles. The values of the
tiles are random between 1 and 20. The tiles present in the room are randomized
at each episode. When the Agent visits a tile, it turns green.
- Goal: Visit all the tiles in ascending order.
- Agents: The environment contains a single Agent
- Agent Reward Function:
- -.0002 Existential penalty.
- +1 For visiting the right tile
- -1 For visiting the wrong tile
- BehaviorParameters:
- Vector Observations : 4 : 2 floats for Position and 2 floats for orientation
- Variable Length Observations : Between 1 and 20 entities (one for each tile)
each with 22 observations, the first 20 are one hot encoding of the value of the,
the 21st represents the position of the tile and the 22nd is 1 if the tile was
visited and zero otherwise.
- Actions: 3 discrete branched actions corresponding to forward, backward,
sideways movement, as well as rotation.
- Float Properties: One
- num_tiles: The maximum number of tiles to sample.
- Default: 2
- Recommended Minimum: 1
- Recommended Maximum: 20
- Benchmark Mean Reward: Depends on the number of tiles.

1001
docs/images/sorter.png
文件差异内容过多而无法显示
查看文件

正在加载...
取消
保存