浏览代码

Minor edits to documentation (#5239)

/check-for-ModelOverriders
GitHub 4 年前
当前提交
5367efeb
共有 3 个文件被更改,包括 4 次插入4 次删除
  1. 2
      docs/Background-Machine-Learning.md
  2. 2
      docs/ML-Agents-Overview.md
  3. 4
      docs/Using-Tensorboard.md

2
docs/Background-Machine-Learning.md


learned mapping.
- For our reinforcement learning example, the training phase learns the optimal
policy through guided trials, and in the inference phase, the agent observes
and tales actions in the wild using its learned policy.
and takes actions in the wild using its learned policy.
To briefly summarize: all three classes of algorithms involve training and
inference phases in addition to attribute and model selections. What ultimately

2
docs/ML-Agents-Overview.md


works well when there are a limited number of demonstrations. In this framework,
a second neural network, the discriminator, is taught to distinguish whether an
observation/action is from a demonstration or produced by the agent. This
discriminator can the examine a new observation/action and provide it a reward
discriminator can then examine a new observation/action and provide it a reward
based on how close it believes this new observation/action is to the provided
demonstrations.

4
docs/Using-Tensorboard.md


This should increase while the agent is learning, and then decrease once the
reward stabilizes.
- `Losses/Forward Loss` (PPO/SAC+Curiosity) - The mean magnitude of the inverse
- `Losses/Forward Loss` (PPO/SAC+Curiosity) - The mean magnitude of the forward
- `Losses/Inverse Loss` (PPO/SAC+Curiosity) - The mean magnitude of the forward
- `Losses/Inverse Loss` (PPO/SAC+Curiosity) - The mean magnitude of the inverse
model loss function. Corresponds to how well the model is able to predict the
action taken between two observations.

正在加载...
取消
保存