|
|
|
|
|
|
* Goal: Move to the most reward state. |
|
|
|
* Agents: The environment contains one agent. |
|
|
|
* Agent Reward Function: |
|
|
|
* -0.01 at each step |
|
|
|
* +0.1 for arriving at suboptimal state. |
|
|
|
* +1.0 for arriving at optimal state. |
|
|
|
* Behavior Parameters: |
|
|
|
|
|
|
* Visual Observations: None |
|
|
|
* Float Properties: None |
|
|
|
* Benchmark Mean Reward: 0.94 |
|
|
|
* Benchmark Mean Reward: 0.93 |
|
|
|
|
|
|
|
## [3DBall: 3D Balance Ball](https://youtu.be/dheeCO29-EI) |
|
|
|
|
|
|
|