|
|
|
|
|
|
steps: 10000 |
|
|
|
``` |
|
|
|
|
|
|
|
Below are the avaliable hyperparameters for pretraining. |
|
|
|
Below are the available hyperparameters for pretraining. |
|
|
|
|
|
|
|
### Strength |
|
|
|
|
|
|
|
|
|
|
This corresponds to how random the decisions of a Brain are. This should |
|
|
|
initially increase during training, reach a peak, and should decline along |
|
|
|
with the Entropy Coefficient. This is because in the beginning, the agent is |
|
|
|
incentivised to be more random for exploration due to a high entropy coefficient. |
|
|
|
incentivized to be more random for exploration due to a high entropy coefficient. |
|
|
|
If it decreases too soon or takes too long to decrease, `init_entcoef` should be adjusted. |
|
|
|
|
|
|
|
### Learning Rate |
|
|
|
|
|
|
|
|
|
|
### Policy Loss |
|
|
|
|
|
|
|
These values may increase as the agent explores, but should decrease longterm |
|
|
|
These values may increase as the agent explores, but should decrease long-term |
|
|
|
as the agent learns how to solve the task. |
|
|
|
|
|
|
|
### Value Estimate |
|
|
|