|
|
|
|
|
|
- Set-up: Physics-based Humanoid agents with 26 degrees of freedom. These DOFs |
|
|
|
correspond to articulation of the following body-parts: hips, chest, spine, |
|
|
|
head, thighs, shins, feet, arms, forearms and hands. |
|
|
|
- Goal: The agents must move its body toward the goal direction as quickly as |
|
|
|
possible without falling. |
|
|
|
- `WalkerStatic` - Goal direction is always forward. |
|
|
|
- Goal: The agents must move its body toward the goal direction without falling. |
|
|
|
- `WalkerDynamicVariableSpeed`- Goal direction and walking speed are randomized. |
|
|
|
- `WalkerStatic` - Goal direction is always forward. |
|
|
|
- `WalkerStaticVariableSpeed` - Goal direction is always forward. Walking |
|
|
|
speed is randomized |
|
|
|
- +0.02 times body velocity in the goal direction. (run towards target) |
|
|
|
- +0.01 times head direction alignment with goal direction. (face towards target) |
|
|
|
- +0.005 times head y position - left foot y position. (encourage head height) |
|
|
|
- +0.005 times head y position - right foot y position. (encourage head height) |
|
|
|
The reward function is now geometric meaning the reward each step is a product |
|
|
|
of all the rewards instead of a sum, this helps the agent try to maximize all |
|
|
|
rewards instead of the easiest rewards. |
|
|
|
- Body velocity matches goal velocity. (normalized between (0,1)) |
|
|
|
- Head direction alignment with goal direction. (normalized between (0,1)) |
|
|
|
- Vector Observation space: 236 variables corresponding to position, rotation, |
|
|
|
- Vector Observation space: 238 variables corresponding to position, rotation, |
|
|
|
velocity, and angular velocities of each limb, along with goal direction. |
|
|
|
- Vector Action space: (Continuous) Size of 39, corresponding to target |
|
|
|
rotations and strength applicable to the joints. |
|
|
|
|
|
|
- Recommended Minimum: |
|
|
|
- Recommended Maximum: |
|
|
|
- hip_mass: Mass of the hip component of the walker |
|
|
|
- Default: 15 |
|
|
|
- Default: 8 |
|
|
|
- Recommended Minimum: 7 |
|
|
|
- Recommended Maximum: 28 |
|
|
|
- chest_mass: Mass of the chest component of the walker |
|
|
|
|
|
|
- spine_mass: Mass of the spine component of the walker |
|
|
|
- Default: 10 |
|
|
|
- Default: 8 |
|
|
|
- Benchmark Mean Reward for `WalkerStatic`: 1500 |
|
|
|
- Benchmark Mean Reward for `WalkerDynamic`: 700 |
|
|
|
- Benchmark Mean Reward for `WalkerDynamic`: 2500 |
|
|
|
- Benchmark Mean Reward for `WalkerDynamicVariableSpeed`: 1200 |
|
|
|
- Benchmark Mean Reward for `WalkerStatic`: 3500 |
|
|
|
- Benchmark Mean Reward for `WalkerStaticVariableSpeed`: 3000 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Pyramids |
|
|
|
|
|
|
|