adding documentation

4 年前 · 3ae1675c
--- a/docs/Learning-Environment-Design-Agents.md
+++ b/docs/Learning-Environment-Design-Agents.md
    - [Visual Observation Summary & Best Practices](#visual-observation-summary--best-practices)
  - [Raycast Observations](#raycast-observations)
    - [RayCast Observation Summary & Best Practices](#raycast-observation-summary--best-practices)
+  - [Variable Length Observations](#variable-length-observations)
+    - [Variable Length Observation Summary & Best Practices](#variable-length-observation-summary--best-practices)
 - [Actions and Actuators](#actions-and-actuators)
  - [Continuous Actions](#continuous-actions)
  - [Discrete Actions](#discrete-actions)
  for the agent that doesn't require a fully rendered image to convey.
 - Use as few rays and tags as necessary to solve the problem in order to improve
  learning stability and agent performance.
+
+### Variable Length Observations
+
+It is possible to have observations be of variable size by using a `BufferSensor`,
+You can add a `BufferSensor` to your Agent by adding a `BufferSensorComponent` to
+its GameObject.
+The `BufferSensor` can be useful in situations in which the Agent must pay
+attention to a varying number of entities. On the trainer side, the `BufferSensor`
+is processed using an attention module, more information about attention
+mechanisms can be found [here](https://arxiv.org/abs/1706.03762). Training or
+doing inference with variable length observations can be slower than using
+a flat vector observation, but can allow you to represent more complex problems
+such as our [Sorter environmentt](Learning-Environment-Examples.md#sorter).
+Note that even though the `BufferSensor` can process a variable number of
+entities, you still need to define a maximum number of entities. This is
+because our network architecture requires to know what the shape of the
+observations will be. If less entities are processed than the maximum, the
+observation will be padded with zeros, but the trainer will actually ignore
+the padding.
+
+The `BufferSensor` constructor and Editor inspector have two arguments:
+ - `Observation Size` : This is how many floats each entities will be
+ represented with. This number is fixed and all entities must
+ have the same representation. For example, if the entities you want to
+ put into the `BufferSensor` have for relevant information position and
+ speed, then the `Observation Size` should be 6 floats.
+ - `Maximum Number of Entities` : This is the maximum number of entities
+ the `BufferSensor` will be able to collect.
+
+To add an entity's observations to the `BufferSensor`, you need to call the
+`BufferSensor.AppendObservation()` (or `BufferSensorComponent.AppendObservation()`)
+with a float array of size `Observation Size` as argument.
+
+__Note__: Currently, the observations put into the `Buffer Sensor` are
+not normalized, you will need to normalize your observations manually
+between -1 and 1.
+
+#### Variable Length Observation Summary & Best Practices
+ - Attach `BufferSensorComponent` to use
+ - Call `BufferSensor.AppendObservation()` to add the observations of an
+ entity to the `BufferSensor`
+ - Normalize the entities observations before feeding them into the `BufferSensor`
+

 ## Actions and Actuators

--- a/docs/Learning-Environment-Examples.md
+++ b/docs/Learning-Environment-Examples.md
  - 37.6 for vector observations
  - 34.2 for simple heuristic (pick a random valid move)
  - 37.0 for greedy heuristic (pick the highest-scoring valid move)
+
+## Sorter
+![Sorter](images/sorter.png)
+
+ - Set-up: The Agent is in a circular room with numbered tiles. The values of the
+ tiles are random between 1 and 20. The tiles present in the room are randomized
+ at each episode. When the Agent visits a tile, it turns green.
+ - Goal: Visit all the tiles in ascending order.
+ - Agents: The environment contains a single Agent
+ - Agent Reward Function:
+  - -.0002 Existential penalty.
+  - +1 For visiting the right tile
+  - -1 For visiting the wrong tile
+ - BehaviorParameters:
+  - Vector Observations : 4 : 2 floats for Position and 2 floats for orientation
+  - Variable Length Observations : Between 1 and 20 entities (one for each tile)
+  each with 22 observations, the first 20 are one hot encoding of the value of the,
+  the 21st represents the position of the tile and the 22nd is 1 if the tile was
+  visited and zero otherwise.
+  - Actions: 3 discrete branched actions corresponding to forward, backward,
+  sideways movement, as well as rotation.
+  - Float Properties: One
+    - num_tiles: The maximum number of tiles to sample.
+      - Default: 2
+      - Recommended Minimum: 1
+      - Recommended Maximum: 20
+  - Benchmark Mean Reward: Depends on the number of tiles.
--- a/docs/images/sorter.png
+++ b/docs/images/sorter.png