浏览代码

Typos in Getting Started with Ball documentation (#187)

Fixed typos
/tag-0.2.1
Arthur Juliani 7 年前
当前提交
0c79edf9
共有 1 个文件被更改,包括 2 次插入2 次删除
  1. 4
      docs/Getting-Started-with-Balance-Ball.md

4
docs/Getting-Started-with-Balance-Ball.md


From Tensorboard, you will see the summary statistics of six variables:
* Cumulative Reward - The mean cumulative episode reward over all agents. Should increase during a successful training session.
* Value Loss - The mean loss of the value function update. Correlates to how well the model is able to predict the value of each state. This should decrease during a succesful training session.
* Policy Loss - The mean loss of the policy function update. Correlates to how much the policy (process for deciding actions) is changing. The magnitude of this should decrease during a succesful training session.
* Value Loss - The mean loss of the value function update. Correlates to how well the model is able to predict the value of each state. This should decrease during a successful training session.
* Policy Loss - The mean loss of the policy function update. Correlates to how much the policy (process for deciding actions) is changing. The magnitude of this should decrease during a successful training session.
* Episode Length - The mean length of each episode in the environment for all agents.
* Value Estimates - The mean value estimate for all states visited by the agent. Should increase during a successful training session.
* Policy Entropy - How random the decisions of the model are. Should slowly decrease during a successful training process. If it decreases too quickly, the `beta` hyperparameter should be increased.

正在加载...
取消
保存