Merge branch 'master' into release_13_branch-to-master

3 年前 · 747e2228
--- a/com.unity.ml-agents.extensions/Documentation~/Grid-Sensor.md
+++ b/com.unity.ml-agents.extensions/Documentation~/Grid-Sensor.md

 ### Channel Based

-The Channel Based Grid Observations represent obsevations in a normalized form with 0 to 1. To distinguish between categorical and continuous data, one would use the ChannelDepth array to signify the ranges that the values in the `channelValues` array could take. If one sets ChannelDepth[i] to be 1, it is assumed that the value of `channelValues[i]` is already normalized. Else ChannelDepth[i] represents the total number of possible values that `channelValues[i]` can take and will be used for normalization.
+The Channel Based Grid Observations is perhaps the simplest in terms of usability and similarity with other machine learning applications. Each grid is of size WxHxC where C is the number of channels. To distinguish between categorical and continuous data, one would use the ChannelDepth array to signify the ranges that the values in the `channelValues` array could take. If one sets ChannelDepth[i] to be 1, it is assumed that the value of `channelValues[i]` is already normalized. Else ChannelDepth[i] represents the total number of possible values that `channelValues[i]` can take.
+As the "enemy" is in the second position of the observed tags, its value can be normalized by:
 For ObjectType, "weapon", "enemy" will be represented respectively as:
 ```
 weapon = DetectableObjects.IndexOfTag("weapon")/ChannelDepth[0] = 1/2 = 0.5;
--- a/com.unity.ml-agents/CHANGELOG.md
+++ b/com.unity.ml-agents/CHANGELOG.md
 #### ml-agents / ml-agents-envs / gym-unity (Python)
 - An issue that caused `GAIL` to fail for environments where agents can terminate episodes by self-sacrifice has been fixed. (#4971)

-
 ## [1.8.0-preview] - 2021-02-17
 ### Major Changes
 #### com.unity.ml-agents (C#)
--- a/config/ppo/PyramidsRND.yaml
+++ b/config/ppo/PyramidsRND.yaml
        strength: 0.01
        network_settings:
          hidden_units: 64
+          num_layers: 3
        learning_rate: 0.0001
    keep_checkpoints: 5
    max_steps: 3000000