浏览代码

Update Learning-Environment-Examples.md

/active-variablespeed
GitHub 4 年前
当前提交
393630d4
共有 1 个文件被更改,包括 19 次插入12 次删除
  1. 31
      docs/Learning-Environment-Examples.md

31
docs/Learning-Environment-Examples.md


- Set-up: Physics-based Humanoid agents with 26 degrees of freedom. These DOFs
correspond to articulation of the following body-parts: hips, chest, spine,
head, thighs, shins, feet, arms, forearms and hands.
- Goal: The agents must move its body toward the goal direction as quickly as
possible without falling.
- `WalkerStatic` - Goal direction is always forward.
- Goal: The agents must move its body toward the goal direction without falling.
- `WalkerDynamicVariableSpeed`- Goal direction and walking speed are randomized.
- `WalkerStatic` - Goal direction is always forward.
- `WalkerStaticVariableSpeed` - Goal direction is always forward. Walking
speed is randomized
- +0.02 times body velocity in the goal direction. (run towards target)
- +0.01 times head direction alignment with goal direction. (face towards target)
- +0.005 times head y position - left foot y position. (encourage head height)
- +0.005 times head y position - right foot y position. (encourage head height)
The reward function is now geometric meaning the reward each step is a product
of all the rewards instead of a sum, this helps the agent try to maximize all
rewards instead of the easiest rewards.
- Body velocity matches goal velocity. (normalized between (0,1))
- Head direction alignment with goal direction. (normalized between (0,1))
- Vector Observation space: 236 variables corresponding to position, rotation,
- Vector Observation space: 238 variables corresponding to position, rotation,
velocity, and angular velocities of each limb, along with goal direction.
- Vector Action space: (Continuous) Size of 39, corresponding to target
rotations and strength applicable to the joints.

- Recommended Minimum:
- Recommended Maximum:
- hip_mass: Mass of the hip component of the walker
- Default: 15
- Default: 8
- Recommended Minimum: 7
- Recommended Maximum: 28
- chest_mass: Mass of the chest component of the walker

- spine_mass: Mass of the spine component of the walker
- Default: 10
- Default: 8
- Benchmark Mean Reward for `WalkerStatic`: 1500
- Benchmark Mean Reward for `WalkerDynamic`: 700
- Benchmark Mean Reward for `WalkerDynamic`: 2500
- Benchmark Mean Reward for `WalkerDynamicVariableSpeed`: 1200
- Benchmark Mean Reward for `WalkerStatic`: 3500
- Benchmark Mean Reward for `WalkerStaticVariableSpeed`: 3000
## Pyramids

正在加载...
取消
保存