* Add benchmark thresholds for example environments

7 年前 · b1a30f84
--- a/docs/Learning-Environment-Examples.md
+++ b/docs/Learning-Environment-Examples.md
    * Vector Action space: (Discrete) Two possible actions (Move left, move right).
    * Visual Observations: 0
 * Reset Parameters: None
+* Benchmark Mean Reward: 0.94

 ## [3DBall: 3D Balance Ball](https://youtu.be/dheeCO29-EI)

    * Vector Action space: (Continuous) Size of 2, with one value corresponding to X-rotation, and the other to Z-rotation.
    * Visual Observations: 0
 * Reset Parameters: None
+* Benchmark Mean Reward: 100

 ## [GridWorld](https://youtu.be/gu8HE9WKEVI)

    * Vector Action space: (Discrete) Size of 4, corresponding to movement in cardinal directions.
    * Visual Observations: One corresponding to top-down view of GridWorld.
 * Reset Parameters: Three, corresponding to grid size, number of obstacles, and number of goals.
-
+* Benchmark Mean Reward: 0.8

 ## [Tennis](https://youtu.be/RDaIh7JX6RI)

    * Vector Action space: (Continuous) Size of 2, corresponding to movement toward net or away from net, and jumping.
    * Visual Observations: None
 * Reset Parameters: One, corresponding to size of ball.
+* Benchmark Mean Reward: 2.5

 ## [Push Block](https://youtu.be/jKdw216ZgoE)

    * Vector Action space: (Continuous) Size of 2, corresponding to movement in X and Z directions.
    * Visual Observations: None.
 * Reset Parameters: None.
+* Benchmark Mean Reward: 4.5

 ## [Wall Jump](https://youtu.be/NITLug2DIWQ)

    * Vector Action space: (Discrete) Size of 74, corresponding to 14 raycasts each detecting 4 possible objects. plus the global position of the agent and whether or not the agent is grounded.
    * Visual Observations: None.
 * Reset Parameters: 4, corresponding to the height of the possible walls.
+* Benchmark Mean Reward (Big & Small Wall Brain): 0.8

 ## [Reacher](https://youtu.be/2N9EoF6pQyE)

 * Goal: The agents must move it's hand to the goal location, and keep it there.
-* Agents: The environment contains 32 agent linked to a single brain.
+* Agents: The environment contains 10 agent linked to a single brain.
 * Agent Reward Function (independent): 
    * +0.1 Each step agent's hand is in goal location.
 * Brains: One brain with the following observation/action space.
 * Reset Parameters: Two, corresponding to goal size, and goal movement speed.
+* Benchmark Mean Reward: 30

 ## [Crawler](https://youtu.be/ftLliaeooYI)

    * Vector Action space: (Continuous) Size of 12, corresponding to torque applicable to 12 joints. 
    * Visual Observations: None
 * Reset Parameters: None
+* Benchmark Mean Reward: 2000

 ## [Banana Collector](https://youtu.be/heVMs3t9qSk)

    * Vector Action space: (Continuous) Size of 3, corresponding to forward movement, y-axis rotation, and whether to use laser to disable other agents.
    * Visual Observations (Optional; None by default): First-person view for each agent. 
 * Reset Parameters: None
+* Benchmark Mean Reward: 10

 ## [Hallway](https://youtu.be/53GyfpPQRUQ)

    * Vector Action space: (Discrete) 4 corresponding to agent rotation and forward/backward movement.
    * Visual Observations (Optional): First-person view for the agent.
 * Reset Parameters: None
+* Benchmark Mean Reward: 0.7

 ## [Bouncer](https://youtu.be/Tkv-c-b1b2I)

    * Vector Action space: (Continuous) 3 corresponding to agent force applied for the jump.
    * Visual Observations: None
 * Reset Parameters: None
+* Benchmark Mean Reward: 2.5

 ## [Soccer Twos](https://youtu.be/Hg3nmYD3DjQ)

        * Goalie: 4 corresponding to forward, backward, sideways movement.
    * Visual Observations: None
 * Reset Parameters: None
+* Benchmark Mean Reward (Striker & Goalie Brain): 0 (the means will be inverse of each other and criss crosses during training)

 ## Walker

    * Vector Action space: (Continuous) Size of 39, corresponding to target rotations applicable to the joints. 
    * Visual Observations: None
 * Reset Parameters: None
-
--- a/docs/Training-Curriculum-Learning.md
+++ b/docs/Training-Curriculum-Learning.md
 accomplish the task. From there, we can slowly add to the difficulty of the task by 
 increasing the size of the wall, until the agent can complete the initially 
 near-impossible task of scaling the wall. We are including just such an environment with 
-ML-Agents 0.2, called Wall Area.
+ML-Agents 0.2, called Wall Jump.

 ![Wall](images/curriculum.png)


 ```json
 {
-    "measure" : "reward",
-    "thresholds" : [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
+    "measure" : "progress",
+    "thresholds" : [0.1, 0.3, 0.5],
-        "min_wall_height" : [0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5],
-        "max_wall_height" : [1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0]
+        "big_wall_min_height" : [0.0, 4.0, 6.0, 8.0],
+        "big_wall_max_height" : [4.0, 7.0, 8.0, 8.0],
+        "small_wall_height" : [1.5, 2.0, 2.5, 4.0]
    }
 }
 ```