Minor edits to documentation (#5239)

4 年前 · 5367efeb
--- a/docs/Background-Machine-Learning.md
+++ b/docs/Background-Machine-Learning.md
  learned mapping.
 - For our reinforcement learning example, the training phase learns the optimal
  policy through guided trials, and in the inference phase, the agent observes
-  and tales actions in the wild using its learned policy.
+  and takes actions in the wild using its learned policy.

 To briefly summarize: all three classes of algorithms involve training and
 inference phases in addition to attribute and model selections. What ultimately
--- a/docs/ML-Agents-Overview.md
+++ b/docs/ML-Agents-Overview.md
 works well when there are a limited number of demonstrations. In this framework,
 a second neural network, the discriminator, is taught to distinguish whether an
 observation/action is from a demonstration or produced by the agent. This
-discriminator can the examine a new observation/action and provide it a reward
+discriminator can then examine a new observation/action and provide it a reward
 based on how close it believes this new observation/action is to the provided
 demonstrations.

--- a/docs/Using-Tensorboard.md
+++ b/docs/Using-Tensorboard.md
  This should increase while the agent is learning, and then decrease once the
  reward stabilizes.

- `Losses/Forward Loss` (PPO/SAC+Curiosity) - The mean magnitude of the inverse
+- `Losses/Forward Loss` (PPO/SAC+Curiosity) - The mean magnitude of the forward
- `Losses/Inverse Loss` (PPO/SAC+Curiosity) - The mean magnitude of the forward
+- `Losses/Inverse Loss` (PPO/SAC+Curiosity) - The mean magnitude of the inverse
  model loss function. Corresponds to how well the model is able to predict the
  action taken between two observations.