added a sentence in the machine learning background doc (#4043)

5 年前 · 009dbaf5
--- a/docs/Background-Machine-Learning.md
+++ b/docs/Background-Machine-Learning.md
 water hose and whether the hose is on or off).

 The last remaining piece of the reinforcement learning task is the **reward
-signal**. When training a robot to be a mean firefighting machine, we provide it
+signal**. The robot is trained to learn a policy that maximizes its overall rewards. When training a robot to be a mean firefighting machine, we provide it
 with rewards (positive and negative) indicating how well it is doing on
 completing the task. Note that the robot does not _know_ how to put out fires
 before it is trained. It learns the objective because it receives a large