浏览代码

Fix spelling error in documentation (#2636)

/develop-gpu-test
GitHub 5 年前
当前提交
17b3a805
共有 2 个文件被更改,包括 4 次插入4 次删除
  1. 2
      docs/Training-PPO.md
  2. 6
      docs/Training-SAC.md

2
docs/Training-PPO.md


steps: 10000
```
Below are the avaliable hyperparameters for pretraining.
Below are the available hyperparameters for pretraining.
### Strength

6
docs/Training-SAC.md


steps: 10000
```
Below are the avaliable hyperparameters for pretraining.
Below are the available hyperparameters for pretraining.
### Strength

This corresponds to how random the decisions of a Brain are. This should
initially increase during training, reach a peak, and should decline along
with the Entropy Coefficient. This is because in the beginning, the agent is
incentivised to be more random for exploration due to a high entropy coefficient.
incentivized to be more random for exploration due to a high entropy coefficient.
If it decreases too soon or takes too long to decrease, `init_entcoef` should be adjusted.
### Learning Rate

### Policy Loss
These values may increase as the agent explores, but should decrease longterm
These values may increase as the agent explores, but should decrease long-term
as the agent learns how to solve the task.
### Value Estimate

正在加载...
取消
保存