浏览代码

* Add benchmark thresholds for example environments

/develop-generalizationTraining-TrainerController
Arthur Juliani 7 年前
当前提交
b1a30f84
共有 2 个文件被更改,包括 19 次插入8 次删除
  1. 16
      docs/Learning-Environment-Examples.md
  2. 11
      docs/Training-Curriculum-Learning.md

16
docs/Learning-Environment-Examples.md


* Vector Action space: (Discrete) Two possible actions (Move left, move right).
* Visual Observations: 0
* Reset Parameters: None
* Benchmark Mean Reward: 0.94
## [3DBall: 3D Balance Ball](https://youtu.be/dheeCO29-EI)

* Vector Action space: (Continuous) Size of 2, with one value corresponding to X-rotation, and the other to Z-rotation.
* Visual Observations: 0
* Reset Parameters: None
* Benchmark Mean Reward: 100
## [GridWorld](https://youtu.be/gu8HE9WKEVI)

* Vector Action space: (Discrete) Size of 4, corresponding to movement in cardinal directions.
* Visual Observations: One corresponding to top-down view of GridWorld.
* Reset Parameters: Three, corresponding to grid size, number of obstacles, and number of goals.
* Benchmark Mean Reward: 0.8
## [Tennis](https://youtu.be/RDaIh7JX6RI)

* Vector Action space: (Continuous) Size of 2, corresponding to movement toward net or away from net, and jumping.
* Visual Observations: None
* Reset Parameters: One, corresponding to size of ball.
* Benchmark Mean Reward: 2.5
## [Push Block](https://youtu.be/jKdw216ZgoE)

* Vector Action space: (Continuous) Size of 2, corresponding to movement in X and Z directions.
* Visual Observations: None.
* Reset Parameters: None.
* Benchmark Mean Reward: 4.5
## [Wall Jump](https://youtu.be/NITLug2DIWQ)

* Vector Action space: (Discrete) Size of 74, corresponding to 14 raycasts each detecting 4 possible objects. plus the global position of the agent and whether or not the agent is grounded.
* Visual Observations: None.
* Reset Parameters: 4, corresponding to the height of the possible walls.
* Benchmark Mean Reward (Big & Small Wall Brain): 0.8
## [Reacher](https://youtu.be/2N9EoF6pQyE)

* Goal: The agents must move it's hand to the goal location, and keep it there.
* Agents: The environment contains 32 agent linked to a single brain.
* Agents: The environment contains 10 agent linked to a single brain.
* Agent Reward Function (independent):
* +0.1 Each step agent's hand is in goal location.
* Brains: One brain with the following observation/action space.

* Reset Parameters: Two, corresponding to goal size, and goal movement speed.
* Benchmark Mean Reward: 30
## [Crawler](https://youtu.be/ftLliaeooYI)

* Vector Action space: (Continuous) Size of 12, corresponding to torque applicable to 12 joints.
* Visual Observations: None
* Reset Parameters: None
* Benchmark Mean Reward: 2000
## [Banana Collector](https://youtu.be/heVMs3t9qSk)

* Vector Action space: (Continuous) Size of 3, corresponding to forward movement, y-axis rotation, and whether to use laser to disable other agents.
* Visual Observations (Optional; None by default): First-person view for each agent.
* Reset Parameters: None
* Benchmark Mean Reward: 10
## [Hallway](https://youtu.be/53GyfpPQRUQ)

* Vector Action space: (Discrete) 4 corresponding to agent rotation and forward/backward movement.
* Visual Observations (Optional): First-person view for the agent.
* Reset Parameters: None
* Benchmark Mean Reward: 0.7
## [Bouncer](https://youtu.be/Tkv-c-b1b2I)

* Vector Action space: (Continuous) 3 corresponding to agent force applied for the jump.
* Visual Observations: None
* Reset Parameters: None
* Benchmark Mean Reward: 2.5
## [Soccer Twos](https://youtu.be/Hg3nmYD3DjQ)

* Goalie: 4 corresponding to forward, backward, sideways movement.
* Visual Observations: None
* Reset Parameters: None
* Benchmark Mean Reward (Striker & Goalie Brain): 0 (the means will be inverse of each other and criss crosses during training)
## Walker

* Vector Action space: (Continuous) Size of 39, corresponding to target rotations applicable to the joints.
* Visual Observations: None
* Reset Parameters: None

11
docs/Training-Curriculum-Learning.md


accomplish the task. From there, we can slowly add to the difficulty of the task by
increasing the size of the wall, until the agent can complete the initially
near-impossible task of scaling the wall. We are including just such an environment with
ML-Agents 0.2, called Wall Area.
ML-Agents 0.2, called Wall Jump.
![Wall](images/curriculum.png)

```json
{
"measure" : "reward",
"thresholds" : [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
"measure" : "progress",
"thresholds" : [0.1, 0.3, 0.5],
"min_wall_height" : [0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5],
"max_wall_height" : [1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0]
"big_wall_min_height" : [0.0, 4.0, 6.0, 8.0],
"big_wall_max_height" : [4.0, 7.0, 8.0, 8.0],
"small_wall_height" : [1.5, 2.0, 2.5, 4.0]
}
}
```

正在加载...
取消
保存