[MLA-1010] ObservableAttribute docs, update Sensor docs (#4058)

* update observation docs * stacking * update TOC * fix TOC * yo * PR feedback
5 年前 · 431a4f41
--- a/com.unity.ml-agents/Runtime/Policies/BehaviorParameters.cs
+++ b/com.unity.ml-agents/Runtime/Policies/BehaviorParameters.cs
    public enum ObservableAttributeOptions
    {
        /// <summary>
-        /// All ObservableAttributes on the Agent will be ignored. If there are no
-        /// ObservableAttributes on the Agent, this will result in the fastest
-        /// initialization time.
+        /// All ObservableAttributes on the Agent will be ignored. This is the
+        /// default behavior. If there are no  ObservableAttributes on the
+        /// Agent, this will result in the fastest initialization time.
-        /// inherited are ignored. This is the default behavior, and a reasonable
-        /// tradeoff between performance and flexibility.
+        /// inherited are ignored. This is a reasonable tradeoff between
+        /// performance and flexibility.
        /// </summary>
        /// <remarks>This corresponds to setting the
        /// [BindingFlags.DeclaredOnly](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.bindingflags?view=netcore-3.1)

        /// <summary>
        /// All members on the class will be examined. This can lead to slower
-        /// startup times
+        /// startup times.
        /// </summary>
        ExamineAll
    }
--- a/com.unity.ml-agents/Runtime/Sensors/Reflection/EnumReflectionSensor.cs
+++ b/com.unity.ml-agents/Runtime/Sensors/Reflection/EnumReflectionSensor.cs

 namespace Unity.MLAgents.Sensors.Reflection
 {
-    internal class EnumReflectionSensor: ReflectionSensorBase
+    internal class EnumReflectionSensor : ReflectionSensorBase
    {
        Array m_Values;
        bool m_IsFlags;
            var enumValue = (Enum)GetReflectedValue();

            int i = 0;
-            foreach(var val in m_Values)
+            foreach (var val in m_Values)
            {
                if (m_IsFlags)
                {
--- a/docs/Learning-Environment-Design-Agents.md
+++ b/docs/Learning-Environment-Design-Agents.md

 - [Decisions](#decisions)
 - [Observations and Sensors](#observations-and-sensors)
+  - [Generating Observations](#generating-observations)
+    - [Agent.CollectObservations()](#agentcollectobservations)
+    - [Observable Fields and Properties](#observable-fields-and-properties)
+    - [ISensor interface and SensorComponents](#isensor-interface-and-sensorcomponents)
+    - [Stacking](#stacking)
    - [Vector Observation Summary & Best Practices](#vector-observation-summary--best-practices)
  - [Visual Observations](#visual-observations)
    - [Visual Observation Summary & Best Practices](#visual-observation-summary--best-practices)
 write your own Policy. If the Agent has a `Model` file, its Policy will use the
 neural network `Model` to take decisions.

-When you create an Agent, you must extend the base Agent class. This includes
-implementing the following methods:
+When you create an Agent, you should usually extend the base Agent class. This
+includes implementing the following methods:
-  including at the beginning of the simulation. The Ball3DAgent class uses this
-  function to reset the agent cube and ball to their starting positions. The
-  function randomizes the reset values so that the training generalizes to more
-  than a specific starting position and agent cube attitude.
- `Agent.CollectObservations(VectorSensor sensor)` — Called every simulation
-  step. Responsible for collecting the Agent's observations of the environment.
-  Since the Behavior Parameters of the Agent are set with vector observation
-  space with a state size of 8, the `CollectObservations(VectorSensor sensor)`
-  must call `VectorSensor.AddObservation()` such that vector size adds up to 8.
+  including at the beginning of the simulation.
+- `Agent.CollectObservations(VectorSensor sensor)` — Called every step that the Agent
+  requests a decision. This is one possible way for collecting the Agent's
+  observations of the environment; see [Generating Observations](#generating-observations)
+  below for more options.
-  take. Receives the action chosen by the Agent. The vector action spaces result
-  in a small change in the agent cube's rotation at each step. The
-  `OnActionReceived()` method assigns a reward to the Agent; in this example, an
-  Agent receives a small positive reward for each step it keeps the ball on the
-  agent cube's head and a larger, negative reward for dropping the ball. An
-  Agent's episode is also ended when it drops the ball so that it will reset
-  with a new ball for the next simulation step.
+  take. Receives the action chosen by the Agent. It is also common to assign a
+  reward in this method.
-  returns an array of floats. In the case of the Ball 3D Agent, the
-  `Heuristic()` method converts the keyboard inputs into actions.
+  writes to a provided array of floats.
+
+As a concrete example, here is how the Ball3DAgent class implements these methods:
+
+- `Agent.OnEpisodeBegin()` — Resets the agent cube and ball to their starting
+  positions. The function randomizes the reset values so that the training
+  generalizes to more than a specific starting position and agent cube
+  orientation.
+- `Agent.CollectObservations(VectorSensor sensor)` — Adds information about the
+  orientation of the agent cube, the ball velocity, and the relative position
+  between the ball and the cube. Since the  `CollectObservations()`
+  method calls `VectorSensor.AddObservation()` such that vector size adds up to 8,
+  the Behavior Parameters of the Agent are set with vector observation space
+  with a state size of 8.
+- `Agent.OnActionReceived()` — The vector action spaces result
+  in a small change in the agent cube's rotation at each step. In this example,
+  an Agent receives a small positive reward for each step it keeps the ball on the
+  agent cube's head and a larger, negative reward for dropping the ball. An
+  Agent's episode is also ended when it drops the ball so that it will reset
+  with a new ball for the next simulation step.
+- `Agent.Heuristic()` - Converts the keyboard inputs into actions.

 ## Decisions

 should call `Agent.RequestDecision()` manually.

 ## Observations and Sensors
-
-To make informed decisions, an agent must first make observations of the state
-of the environment. The observations are collected by Sensors attached to the
-agent GameObject. By default, agents come with a `VectorSensor` which allows
-them to collect floating-point observations into a single array. There are
-additional sensor components which can be attached to the agent GameObject which
-collect their own observations, or modify other observations. These are:
-
- `CameraSensorComponent` - Allows image from `Camera` to be used as
-  observation.
- `RenderTextureSensorComponent` - Allows content of `RenderTexture` to be used
-  as observation.
- `RayPerceptionSensorComponent` - Allows information from set of ray-casts to
-  be used as observation.
+In order for an agent to learn, the observations should include all the
+information an agent needs to accomplish its task. Without sufficient and
+relevant information, an agent may learn poorly or may not learn at all. A
+reasonable approach for determining what information should be included is to
+consider what you would need to calculate an analytical solution to the problem,
+or what you would expect a human to be able to use to solve the problem.
-### Vector Observations
+### Generating Observations
+ML-Agents provides multiple ways for an Agent to make observations:
+  1. Overriding the `Agent.CollectObservations()` method and passing the
+    observations to the provided `VectorSensor`.
+  1. Adding the `[Observable]` attribute to fields and properties on the Agent.
+  1. Implementing the `ISensor` interface, using a `SensorComponent` attached to
+  the Agent to create the `ISensor`.
-Vector observations are best used for aspects of the environment which are
+#### Agent.CollectObservations()
+Agent.CollectObservations() is best used for aspects of the environment which are
-In order for an agent to learn, the observations should include all the
-information an agent needs to accomplish its task. Without sufficient and
-relevant information, an agent may learn poorly or may not learn at all. A
-reasonable approach for determining what information should be included is to
-consider what you would need to calculate an analytical solution to the problem,
-or what you would expect a human to be able to use to solve the problem.
+The `VectorSensor.AddObservation` method provides a number of overloads for
+adding common types of data to your observation vector. You can add Integers and
+booleans directly to the observation vector, as well as some common Unity data
+types such as `Vector2`, `Vector3`, and `Quaternion`.
-state observation. As an experiment, you can remove the velocity components from
-the observation and retrain the 3DBall agent. While it will learn to balance the
-ball reasonably well, the performance of the agent without using velocity is
-noticeably worse.
+state observation.
-private List<float> state = new List<float>();
+    // Orientation of the cube (2 floats)
-    sensor.AddObservation((ball.transform.position.x - gameObject.transform.position.x));
-    sensor.AddObservation((ball.transform.position.y - gameObject.transform.position.y));
-    sensor.AddObservation((ball.transform.position.z - gameObject.transform.position.z));
-    sensor.AddObservation(ball.transform.GetComponent<Rigidbody>().velocity.x);
-    sensor.AddObservation(ball.transform.GetComponent<Rigidbody>().velocity.y);
-    sensor.AddObservation(ball.transform.GetComponent<Rigidbody>().velocity.z);
+    // Relative position of the ball to the cube (3 floats)
+    sensor.AddObservation(ball.transform.position - gameObject.transform.position);
+    // Velocity of the ball (3 floats)
+    sensor.AddObservation(m_BallRb.velocity);
+    // 8 floats total
-The feature vector must always contain the same number of elements and
-observations must always be in the same position within the list. If the number
-of observed entities in an environment can vary you can pad the feature vector
-with zeros for any missing entities in a specific observation or you can limit
+As an experiment, you can remove the velocity components from
+the observation and retrain the 3DBall agent. While it will learn to balance the
+ball reasonably well, the performance of the agent without using velocity is
+noticeably worse.
+
+The observations passed to `VectorSensor.AddObservation()` must always contain
+the same number of elements must always be in the same order. If the number
+of observed entities in an environment can vary, you can pad the calls
+with zeros for any missing entities in a specific observation, or you can limit
-every enemy agent in an environment, you could only observe the closest five.
+every enemy in an environment, you could only observe the closest five.
-When you set up an Agent's `Behavior Parameters` in the Unity Editor, set the
-following properties to use a vector observation:
+Additionally, when you set up an Agent's `Behavior Parameters` in the Unity
+Editor, you must set the **Vector Observations > Space Size**
+to equal the number of floats that are written by `CollectObservations()`.
- **Space Size** — The state size must match the length of your feature vector.
+#### Observable Fields and Properties
+Another approach is to define the relevant observations as fields or properties
+on your Agent class, and annotate them with an `ObservableAttribute`. For
+example, in the 3DBall example above, the rigid body velocity could be observed
+by adding a property to the Agent:
+```csharp
+using Unity.MLAgents.Sensors.Reflection;
-The observation feature vector is a list of floating point numbers, which means
-you must convert any other data types to a float or a list of floats.
+public class Ball3DAgent : Agent {
-The `VectorSensor.AddObservation` method provides a number of overloads for
-adding common types of data to your observation vector. You can add Integers and
-booleans directly to the observation vector, as well as some common Unity data
-types such as `Vector2`, `Vector3`, and `Quaternion`.
+    [Observable]
+    public Vector3 RigidBodyVelocity
+    {
+        get { return m_BallRb.velocity;  }
+    }
+}
+```
+`ObservableAttribute` currently supports most basic types (e.g. floats, ints,
+bools), as well as `Vector2`, `Vector3`, `Vector4`, `Quaternion`, and enums.
+
+The behavior of `ObservableAttribute`s are controlled by the "Observable Attribute
+Handling" in the Agent's `Behavior Parameters`. The possible values for this are:
+ * **Ignore** (default) - All ObservableAttributes on the Agent will be ignored.
+  If there are no ObservableAttributes on the Agent, this will result in the
+  fastest  initialization time.
+ * **Exclude Inherited** - Only members on the declared class will be examined;
+  members that are inherited are ignored. This is a reasonable tradeoff between
+  performance and flexibility.
+ * **Examine All** All members on the class will be examined. This can lead to
+  slower startup times.
+
+"Exclude Inherited" is generally sufficient, but if your Agent inherits from
+another Agent implementation that has Observable members, you will need to use
+"Examine All".
+
+Internally, ObservableAttribute uses reflection to determine which members of
+the Agent have ObservableAttributes, and also uses reflection to access the
+fields or invoke the properties at runtime. This may be slower than using
+CollectObservations or an ISensor, although this might not be enough to
+noticeably affect performance.
+
+**NOTE**: you do not need to adjust the Space Size in the Agent's
+`Behavior Parameters` when you add `[Observable]` fields or properties to an
+Agent, since their size can be computed before they are used.
+
+#### ISensor interface and SensorComponents
+The `ISensor` interface is generally intended for advanced users. The `Write()`
+method is used to actually generate the observation, but some other methods
+such as returning the shape of the observations must also be implemented.
+
+The `SensorComponent` abstract class is used to create the actual `ISensor` at
+runtime. It must be attached to the same `GameObject` as the `Agent`, or to a
+child `GameObject`.
+
+There are several SensorComponents provided in the API:
+- `CameraSensorComponent` - Allows image from `Camera` to be used as
+  observation.
+- `RenderTextureSensorComponent` - Allows content of `RenderTexture` to be used
+  as observation.
+- `RayPerceptionSensorComponent` - Allows information from set of ray-casts to
+  be used as observation.
+
+**NOTE**: you do not need to adjust the Space Size in the Agent's
+`Behavior Parameters` when using an ISensor SensorComponents.
+
+Internally, both `Agent.CollectObservations` and `[Observable]` attribute use an
+ISensors to write observations, although this is mostly abstracted from the user.
+
+### Vector Observations
+Both `Agent.CollectObservations()` and `ObservableAttribute`s produce vector
+observations, which are represented at lists of `float`s. `ISensor`s can
+produce both vector observations and visual observations, which are
+multi-dimensional arrays of floats.
+
+Below are some additional considerations when dealing with vector observations:

 #### One-hot encoding categorical information

 the feature vector. The following code example illustrates how to add.

 ```csharp
-enum CarriedItems { Sword, Shield, Bow, LastItem }
-private List<float> state = new List<float>();
+enum ItemType { Sword, Shield, Bow, LastItem }
-    for (int ci = 0; ci < (int)CarriedItems.LastItem; ci++)
+    for (int ci = 0; ci < (int)ItemType.LastItem; ci++)
    {
        sensor.AddObservation((int)currentItem == ci ? 1.0f : 0.0f);
    }
 to the previous one.

 ```csharp
-enum CarriedItems { Sword, Shield, Bow, LastItem }
-const int NUM_ITEM_TYPES = (int)CarriedItems.LastItem;
+enum ItemType { Sword, Shield, Bow, LastItem }
+const int NUM_ITEM_TYPES = (int)ItemType.LastItem;

 public override void CollectObservations(VectorSensor sensor)
 {
+}
+```
+
+`ObservableAttribute` has built-in support for enums. Note that you don't need
+the `LastItem` placeholder in this case:
+```csharp
+enum ItemType { Sword, Shield, Bow }
+
+public class HeroAgent : Agent
+{
+    [Observable]
+    ItemType m_CurrentItem;
 }
 ```

 angle, or, if the number of turns is significant, increase the maximum value
 used in your normalization formula.

+#### Stacking
+Stacking refers to repeating observations from previous steps as part of a
+larger observation. For example, consider an Agent that generates these
+observations in four steps
+```
+step 1: [0.1]
+step 2: [0.2]
+step 3: [0.3]
+step 4: [0.4]
+```
+
+If we use a stack size of 3, the observations would instead be:
+```csharp
+step 1: [0.1, 0.0, 0.0]
+step 2: [0.2, 0.1, 0.0]
+step 3: [0.3, 0.2, 0.1]
+step 4: [0.4, 0.3, 0.2]
+```
+(The observations are padded with zeroes for the first `stackSize-1` steps).
+This is a simple way to give an Agent limited "memory" without the complexity
+of adding a recurrent neural network (RNN).
+
+The steps for enabling stacking depends on how you generate observations:
+* For Agent.CollectObservations(), set "Stacked Vectors" on the Agent's
+  `Behavior Parameters` to a value greater than 1.
+* For ObservableAttribute, set the `numStackedObservations` parameter in the
+  constructor, e.g. `[Observable(numStackedObservations: 2)]`.
+* For `ISensor`s, wrap them in a `StackingSensor` (which is also an `ISensor`).
+  Generally, this should happen in the `CreateSensor()` method of your
+  `SensorComponent`.
+
+Note that stacking currently only supports for vector observations; stacking
+for visual observations is not supported.
+
 #### Vector Observation Summary & Best Practices

 - Vector Observations should include all variables relevant for allowing the
  value in the agent GameObject's `Behavior Parameters` should be changed.
 - Categorical variables such as type of object (Sword, Shield, Bow) should be
  encoded in one-hot fashion (i.e. `3` -> `0, 0, 1`). This can be done
-  automatically using the `AddOneHotObservation()` method of the `VectorSensor`.
+  automatically using the `AddOneHotObservation()` method of the `VectorSensor`,
+  or using `[Observable]` on an enum field or property of the Agent.
 - In general, all inputs should be normalized to be in the range 0 to +1 (or -1
  to 1). For example, the `x` position information of an agent where the maximum
  possible value is `maxValue` should be recorded as
--- a/docs/Learning-Environment-Design.md
+++ b/docs/Learning-Environment-Design.md

 1. Calls your Academy's `OnEnvironmentReset` delegate.
 1. Calls the `OnEpisodeBegin()` function for each Agent in the scene.
-1. Calls the `CollectObservations(VectorSensor sensor)` function for each Agent
-   in the scene.
+1. Gathers information about the scene. This is done by calling the
+  `CollectObservations(VectorSensor sensor)` function for each Agent in the
+  scene, as well as updating their sensor and collecting the resulting
+  observations.
 1. Uses each Agent's Policy to decide on the Agent's next action.
 1. Calls the `OnActionReceived()` function for each Agent in the scene, passing
   in the action chosen by the Agent's Policy.
 in a football game or a car object in a vehicle simulation. Every Agent must
 have appropriate `Behavior Parameters`.

-To create an Agent, extend the Agent class and implement the essential
-`CollectObservations(VectorSensor sensor)` and `OnActionReceived()` methods:
+Generally, when creating an Agent, you should extend the Agent class and implement
+the `CollectObservations(VectorSensor sensor)` and `OnActionReceived()` methods:

 - `CollectObservations(VectorSensor sensor)` — Collects the Agent's observation
  of its environment.