Removing trailing spaces.

6 年前 · 8dc4221c
--- a/docs/Training-Curriculum-Learning.md
+++ b/docs/Training-Curriculum-Learning.md

 ## Sample Environment

-Imagine a task in which an agent needs to scale a wall to arrive at a goal. The starting 
-point when training an agent to accomplish this task will be a random policy. That 
-starting policy will have the agent running in circles, and will likely never, or very 
-rarely scale the wall properly to the achieve the reward. If we start with a simpler 
-task, such as moving toward an unobstructed goal, then the agent can easily learn to 
-accomplish the task. From there, we can slowly add to the difficulty of the task by 
-increasing the size of the wall, until the agent can complete the initially 
-near-impossible task of scaling the wall. We are including just such an environment with 
+Imagine a task in which an agent needs to scale a wall to arrive at a goal. The starting
+point when training an agent to accomplish this task will be a random policy. That
+starting policy will have the agent running in circles, and will likely never, or very
+rarely scale the wall properly to the achieve the reward. If we start with a simpler
+task, such as moving toward an unobstructed goal, then the agent can easily learn to
+accomplish the task. From there, we can slowly add to the difficulty of the task by
+increasing the size of the wall, until the agent can complete the initially
+near-impossible task of scaling the wall. We are including just such an environment with
-_Demonstration of a curriculum training scenario in which a progressively taller wall 
+_Demonstration of a curriculum training scenario in which a progressively taller wall
- 
-To see this in action, observe the two learning curves below. Each displays the reward 
-over time for an agent trained using PPO with the same set of training hyperparameters. 
+
+To see this in action, observe the two learning curves below. Each displays the reward
+over time for an agent trained using PPO with the same set of training hyperparameters.
-the task. As you can see, without using curriculum learning the agent has a lot of 
-difficulty. We think that by using well-crafted curricula, agents trained using 
-reinforcement learning will be able to accomplish tasks otherwise much more difficult. 
- 
+the task. As you can see, without using curriculum learning the agent has a lot of
+difficulty. We think that by using well-crafted curricula, agents trained using
+reinforcement learning will be able to accomplish tasks otherwise much more difficult.
+
- 
-In order to define a curriculum, the first step is to decide which 
-parameters of the environment will vary. In the case of the Wall Area environment, what 
-varies is the height of the wall. We define this as a `Reset Parameter` in the Academy 
-object of our scene, and by doing so it becomes adjustable via the Python API. Rather 
-than adjusting it by hand, we then create a simple JSON file which describes the 
-structure of the curriculum. Within it we can set at what points in the training process 
-our wall height will change, either based on the percentage of training steps which have 
+
+In order to define a curriculum, the first step is to decide which
+parameters of the environment will vary. In the case of the Wall Area environment, what
+varies is the height of the wall. We define this as a `Reset Parameter` in the Academy
+object of our scene, and by doing so it becomes adjustable via the Python API. Rather
+than adjusting it by hand, we then create a simple JSON file which describes the
+structure of the curriculum. Within it we can set at what points in the training process
+our wall height will change, either based on the percentage of training steps which have
-Once these are in place, we simply launch learn.py using the `–curriculum` flag to 
-point to the JSON file, and PPO we will train using Curriculum Learning. Of course we can 
+Once these are in place, we simply launch learn.py using the `–curriculum` flag to
+point to the JSON file, and PPO we will train using Curriculum Learning. Of course we can
 then keep track of the current lesson and progress via TensorBoard.


    "thresholds" : [0.1, 0.3, 0.5],
    "min_lesson_length" : 2,
-    "signal_smoothing" : true, 
-    "parameters" : 
+    "signal_smoothing" : true,
+    "parameters" :
    {
        "big_wall_min_height" : [0.0, 4.0, 6.0, 8.0],
        "big_wall_max_height" : [4.0, 7.0, 8.0, 8.0],
 ```

 * `measure` - What to measure learning progress, and advancement in lessons by.
-    * `reward` - Uses a measure received reward. 
+    * `reward` - Uses a measure received reward.
-* `min_lesson_length` (int) - How many times the progress measure should be reported before 
+* `min_lesson_length` (int) - How many times the progress measure should be reported before
 incrementing the lesson.
 * `signal_smoothing` (true/false) - Whether to weight the current progress measure by previous values.
    * If `true`, weighting will be 0.75 (new) 0.25 (old).