Fixing tables in documentation and other markdown errors. (#1199)

6 年前 · 020d6e8b
--- a/docs/Basic-Guide.md
+++ b/docs/Basic-Guide.md
    - `--train` tells `mlagents-learn` to run a training session (rather
      than inference)
 4. If you cloned the ML-Agents repo, then you can simply run
+
+
 5. When the message _"Start training by pressing the Play button in the Unity
   Editor"_ is displayed on the screen, you can press the :arrow_forward: button
   in Unity to start training in the Editor.
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
 If you try to use ML-Agents in Unity versions 2017.1 - 2017.3, you might
 encounter an error that looks like this:

-```console 
+```console
 Instance of CoreBrainInternal couldn't be created. The the script
 class needs to derive from ScriptableObject.
 UnityEngine.ScriptableObject:CreateInstance(String)
--- a/docs/Feature-Memory.md
+++ b/docs/Feature-Memory.md
 ## What are memories used for?

 Have you ever entered a room to get something and immediately forgot what you
-were looking for? Don't let that happen to your agents.  
+were looking for? Don't let that happen to your agents.

 It is now possible to give memories to your agents. When training, the agents
 will be able to store a vector of floats to be used next time they need to make
--- a/docs/Getting-Started-with-Balance-Ball.md
+++ b/docs/Getting-Started-with-Balance-Ball.md
 An agent is an autonomous actor that observes and interacts with an
 _environment_. In the context of Unity, an environment is a scene containing an
 Academy and one or more Brain and Agent objects, and, of course, the other
-entities that an agent interacts with.  
+entities that an agent interacts with.

 ![Unity Editor](images/mlagents-3DBallHierarchy.png)

--- a/docs/Installation.md
+++ b/docs/Installation.md

 Once installed, you will want to clone the ML-Agents Toolkit GitHub repository.

-    git clone https://github.com/Unity-Technologies/ml-agents.git
+```sh
+git clone https://github.com/Unity-Technologies/ml-agents.git
+```

 The `UnitySDK` subdirectory contains the Unity Assets to add to your projects.
 It also contains many [example environments](Learning-Environment-Examples.md)
 To install the dependencies and `mlagents` Python package, enter the
 `ml-agents/` subdirectory and run from the command line:

-    pip3 install .
+```sh
+pip3 install .
+```

 If you installed this correctly, you should be able to run
 `mlagents-learn --help`
--- a/docs/Learning-Environment-Create-New.md
+++ b/docs/Learning-Environment-Create-New.md
   in the scene that represents the Agent in the simulation. Each Agent object
   must be assigned a Brain object.
 6. If training, set the Brain type to External and
-   [run the training process](Training-ML-Agents.md).  
+   [run the training process](Training-ML-Agents.md).

 **Note:** If you are unfamiliar with Unity, refer to
 [Learning the interface](https://docs.unity3d.com/Manual/LearningtheInterface.html)
    public override void AgentReset()
    {
        if (this.transform.position.y < -1.0)
-        {  
+        {
            // The Agent fell
            this.transform.position = Vector3.zero;
            this.rBody.angularVelocity = Vector3.zero;
 There are three kinds of game objects you need to include in your scene in order
 to use Unity ML-Agents:

-* Academy  
-* Brain  
-* Agents  
+* Academy
+* Brain
+* Agents

 Keep in mind:

--- a/docs/Learning-Environment-Design-Brains.md
+++ b/docs/Learning-Environment-Design-Brains.md
 children of the Academy in the Unity scene hierarchy. Every Agent must be
 assigned a Brain, but you can use the same Brain with more than one Agent. You
 can also create several Brains, attach each of the Brain to one or more than one
-Agent.  
+Agent.

 Use the Brain class directly, rather than a subclass. Brain behavior is
 determined by the **Brain Type**. The ML-Agents toolkit defines four Brain

 The Player, Heuristic and Internal Brains have been updated to support
 broadcast. The broadcast feature allows you to collect data from your Agents
-using a Python program without controlling them.  
+using a Python program without controlling them.

 ### How to use: Unity

 the Agents connected to non-External Brains are doing. When calling `step` or
 `reset` on your environment, you retrieve a dictionary mapping Brain names to
 `BrainInfo` objects. The  dictionary contains a `BrainInfo` object for each
-non-External Brain set to broadcast as well as for any External Brains.  
+non-External Brain set to broadcast as well as for any External Brains.
-were taken by the Agents at the previous step, not the current one.  
+were taken by the Agents at the previous step, not the current one.
-call `step()` with no arguments.  
+call `step()` with no arguments.

 You can use the broadcast feature to collect data generated by Player,
 Heuristics or Internal Brains game sessions. You can then use this data to train
--- a/docs/Learning-Environment-Design-External-Internal-Brains.md
+++ b/docs/Learning-Environment-Design-External-Internal-Brains.md
   a Brain component.)
 2. Set the **Brain Type** to **Internal**.
    **Note:** In order to see the **Internal** Brain Type option, you must
-    [enable TensorFlowSharp](Using-TensorFlow-Sharp-in-Unity.md).  
+    [enable TensorFlowSharp](Using-TensorFlow-Sharp-in-Unity.md).
 3. Import the `environment_run-id.bytes` file produced by the PPO training
   program. (Where `environment_run-id` is the name of the model file, which is
   constructed from the name of your Unity environment executable and the run-id
--- a/docs/Learning-Environment-Design-Player-Brains.md
+++ b/docs/Learning-Environment-Design-Player-Brains.md
 can send any number of floating point values (up to the **Vector Action Space
 Size** setting).

-| **Property** |    | **Description** |
-| :--                  |:-- | :--                       |
-|**Continuous Player Actions**|| The mapping for the continuous vector action
-  space. Shown when the action space is **Continuous**|.
-|| **Size** | The number of key commands defined. You can assign more than one
-  command to the same action index in order to send different values for that
-  action. (If you press both keys at the same time, deterministic results are not guaranteed.)|
-||**Element 0–N**| The mapping of keys to action values. |
-|| **Key** | The key on the keyboard. |
-|| **Index** | The element of the Agent's action vector to set when this key is
-  pressed. The index value cannot exceed the size of the Action Space (minus 1,
-  since it is an array index).|
-|| **Value** | The value to send to the Agent as its action for the specified
-  index when the mapped key is pressed. All other members of the action vector
-  are set to 0. |
-|**Discrete Player Actions**|| The mapping for the discrete vector action space.
-  Shown when the action space is **Discrete**.|
-|| **Size** | The number of key commands defined. |
-||**Element 0–N**| The mapping of keys to action values. |
-|| **Key** | The key on the keyboard. |
-|| **Branch Index** |The element of the Agent's action vector to set when this
-  key is pressed. The index value cannot exceed the size of the Action Space
-  (minus 1, since it is an array index).|
-|| **Value** | The value to send to the Agent as its action when the mapped key
-  is pressed. Cannot exceed the max value for the associated branch (minus 1,
-  since it is an array index).|
+|         **Property**          |                  |                                                                                                              **Description**                                                                                                              |
+| :---------------------------- | :--------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **Continuous Player Actions** |                  | The mapping for the continuous vector action space. Shown when the action space is **Continuous**.                                                                                                                                        |
+|                               | **Size**         | The number of key commands defined. You can assign more than one command to the same action index in order to send different values for that action. (If you press both keys at the same time, deterministic results are not guaranteed.) |
+|                               | **Element 0–N**  | The mapping of keys to action values.                                                                                                                                                                                                     |
+|                               | **Key**          | The key on the keyboard.                                                                                                                                                                                                                  |
+|                               | **Index**        | The element of the Agent's action vector to set when this key is pressed. The index value cannot exceed the size of the Action Space (minus 1, since it is an array index).                                                               |
+|                               | **Value**        | The value to send to the Agent as its action for the specified index when the mapped key is pressed. All other members of the action vector are set to 0.                                                                                 |
+| **Discrete Player Actions**   |                  | The mapping for the discrete vector action space. Shown when the action space is **Discrete**.                                                                                                                                            |
+|                               | **Size**         | The number of key commands defined.                                                                                                                                                                                                       |
+|                               | **Element 0–N**  | The mapping of keys to action values.                                                                                                                                                                                                     |
+|                               | **Key**          | The key on the keyboard.                                                                                                                                                                                                                  |
+|                               | **Branch Index** | The element of the Agent's action vector to set when this key is pressed. The index value cannot exceed the size of the Action Space (minus 1, since it is an array index).                                                               |
+|                               | **Value**        | The value to send to the Agent as its action when the mapped key is pressed. Cannot exceed the max value for the associated branch (minus 1, since it is an array index).                                                                 |

 For more information about the Unity input system, see
 [Input](https://docs.unity3d.com/ScriptReference/Input.html).
--- a/docs/Learning-Environment-Design.md
+++ b/docs/Learning-Environment-Design.md
 implement the above methods. The `Agent.CollectObservations()` and
 `Agent.AgentAction()` functions are required; the other methods are optional —
 whether you need to implement them or not depends on your specific scenario.
-  
+
 **Note:** The API used by the Python PPO training process to communicate with
 and control the Academy during training can be used for other purposes as well.
 For example, you could use the API to use Unity as the simulation engine for
 properties is `Max Steps`, which determines how long each training episode
 lasts. Once the Academy's step counter reaches this value, it calls the
 `AcademyReset()` function to start the next episode.
-  
+
-the Academy properties and their uses.  
+the Academy properties and their uses.

 ### Brain

 carries out actions. The Agent class is typically attached to the GameObject in
 the scene that otherwise represents the actor — for example, to a player object
 in a football game or a car object in a vehicle simulation. Every Agent must be
-assigned a Brain.  
+assigned a Brain.

 To create an Agent, extend the Agent class and implement the essential
 `CollectObservations()` and `AgentAction()` methods:
--- a/docs/Learning-Environment-Examples.md
+++ b/docs/Learning-Environment-Examples.md
 * Brains: One Brain with the following observation/action space.
  * Vector Observation space: None
  * Vector Action space: (Discrete) Size of 4, corresponding to movement in
-    cardinal directions. Note that for this environment, 
+    cardinal directions. Note that for this environment,
    [action masking](Learning-Environment-Design-Agents.md#masking-discrete-actions)
    is turned on by default (this option can be toggled
    using the `Mask Actions` checkbox within the `trueAgent` GameObject).
 * Set-up: A platforming environment where the agent can push a block around.
 * Goal: The agent must push the block to the goal.
 * Agents: The environment contains one agent linked to a single brain.
-* Agent Reward Function: 
-    * -0.0025 for every step.
-    * +1.0 if the block touches the goal.
+* Agent Reward Function:
+  * -0.0025 for every step.
+  * +1.0 if the block touches the goal.
-    * Vector Observation space: (Continuous) 70 variables corresponding to 14 
-      ray-casts each detecting one of three possible objects (wall, goal, or block).
-    * Vector Action space: (Discrete) Size of 6, corresponding to turn clockwise 
-      and counterclockwise and move along four different face directions.
-    * Visual Observations (Optional): One first-person camera. Use `VisualPushBlock` scene.
+  * Vector Observation space: (Continuous) 70 variables corresponding to 14
+    ray-casts each detecting one of three possible objects (wall, goal, or
+    block).
+  * Vector Action space: (Discrete) Size of 6, corresponding to turn clockwise
+    and counterclockwise and move along four different face directions.
+  * Visual Observations (Optional): One first-person camera. Use
+    `VisualPushBlock` scene.
 * Reset Parameters: None.
 * Benchmark Mean Reward: 4.5
 * Optional Imitation Learning scene: `PushBlockIL`.
--- a/docs/Migrating.md
+++ b/docs/Migrating.md
  or later`. 2017.4 is an LTS (Long Term Support) version that helps us
  maintain good quality and support. Earlier versions of Unity might still work,
  but you may encounter an
-  [error](FAQ.md#instance-of-corebraininternal-couldnt-be-created) listed here. 
+  [error](FAQ.md#instance-of-corebraininternal-couldnt-be-created) listed here.

 ### Unity API

--- a/docs/Training-ML-Agents.md
+++ b/docs/Training-ML-Agents.md
 settings. (This GameObject will be a child of the Academy in your scene.)
 Sections for the example environments are included in the provided config file.

-| **Setting** | **Description** | **Applies To Trainer\***|
-| :--         | :--             | :--                   |
-| batch_size | The number of experiences in each iteration of gradient descent.| PPO, BC |
-| batches_per_epoch | In imitation learning, the number of batches of training examples to collect before training the model.| BC |
-| beta | The strength of entropy regularization.| PPO |
-| brain\_to\_imitate | For imitation learning, the name of the GameObject containing the Brain component to imitate. | BC |
-| buffer_size | The number of experiences to collect before updating the policy model. | PPO |
-| curiosity\_enc\_size | The size of the encoding to use in the forward and inverse models in the Curioity module. | PPO |
-| curiosity_strength | Magnitude of intrinsic reward generated by Intrinsic Curiosity Module. | PPO |
-| epsilon | Influences how rapidly the policy can evolve during training.| PPO |
-| gamma | The reward discount rate for the Generalized Advantage Estimator (GAE).  | PPO  |
-| hidden_units | The number of units in the hidden layers of the neural network. | PPO, BC |
-| lambd | The regularization parameter. | PPO  |
-| learning_rate | The initial learning rate for gradient descent. | PPO,  BC |
-| max_steps | The maximum number of simulation steps to run during a training session. | PPO, BC |
-| memory_size | The size of the memory an agent must keep. Used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
-| normalize | Whether to automatically normalize observations. | PPO |
-| num_epoch | The number of passes to make through the experience buffer when performing gradient descent optimization. | PPO |
-| num_layers | The number of hidden layers in the neural network. | PPO, BC |
-| sequence_length | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
-| summary_freq | How often, in steps, to save training statistics. This determines the number of data points shown by TensorBoard. | PPO, BC |
-| time_horizon | How many steps of experience to collect per-agent before adding it to the experience buffer. | PPO, BC |
-| trainer | The type of training to perform: "ppo" or "imitation".| PPO, BC |
-| use_curiosity | Train using an additional intrinsic reward signal generated from Intrinsic Curiosity Module. | PPO |
-| use_recurrent | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md).| PPO, BC |
+|     **Setting**      |                                                                                     **Description**                                                                                     | **Applies To Trainer\*** |
+| :------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------- |
+| batch_size           | The number of experiences in each iteration of gradient descent.                                                                                                                        | PPO, BC                  |
+| batches_per_epoch    | In imitation learning, the number of batches of training examples to collect before training the model.                                                                                 | BC                       |
+| beta                 | The strength of entropy regularization.                                                                                                                                                 | PPO                      |
+| brain\_to\_imitate   | For imitation learning, the name of the GameObject containing the Brain component to imitate.                                                                                           | BC                       |
+| buffer_size          | The number of experiences to collect before updating the policy model.                                                                                                                  | PPO                      |
+| curiosity\_enc\_size | The size of the encoding to use in the forward and inverse models in the Curioity module.                                                                                               | PPO                      |
+| curiosity_strength   | Magnitude of intrinsic reward generated by Intrinsic Curiosity Module.                                                                                                                  | PPO                      |
+| epsilon              | Influences how rapidly the policy can evolve during training.                                                                                                                           | PPO                      |
+| gamma                | The reward discount rate for the Generalized Advantage Estimator (GAE).                                                                                                                 | PPO                      |
+| hidden_units         | The number of units in the hidden layers of the neural network.                                                                                                                         | PPO, BC                  |
+| lambd                | The regularization parameter.                                                                                                                                                           | PPO                      |
+| learning_rate        | The initial learning rate for gradient descent.                                                                                                                                         | PPO, BC                  |
+| max_steps            | The maximum number of simulation steps to run during a training session.                                                                                                                | PPO, BC                  |
+| memory_size          | The size of the memory an agent must keep. Used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md).                                 | PPO, BC                  |
+| normalize            | Whether to automatically normalize observations.                                                                                                                                        | PPO                      |
+| num_epoch            | The number of passes to make through the experience buffer when performing gradient descent optimization.                                                                               | PPO                      |
+| num_layers           | The number of hidden layers in the neural network.                                                                                                                                      | PPO, BC                  |
+| sequence_length      | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC                  |
+| summary_freq         | How often, in steps, to save training statistics. This determines the number of data points shown by TensorBoard.                                                                       | PPO, BC                  |
+| time_horizon         | How many steps of experience to collect per-agent before adding it to the experience buffer.                                                                                            | PPO, BC                  |
+| trainer              | The type of training to perform: "ppo" or "imitation".                                                                                                                                  | PPO, BC                  |
+| use_curiosity        | Train using an additional intrinsic reward signal generated from Intrinsic Curiosity Module.                                                                                            | PPO                      |
+| use_recurrent        | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md).                                                                                       | PPO, BC                  |

 \*PPO = Proximal Policy Optimization, BC = Behavioral Cloning (Imitation)

--- a/docs/Training-on-Amazon-Web-Service.md
+++ b/docs/Training-on-Amazon-Web-Service.md

    env = UnityEnvironment(<your_env>)
    ```
+
    Where `<your_env>` corresponds to the path to your environment executable.

    You should receive a message confirming that the environment was loaded successfully.
--- a/docs/Using-TensorFlow-Sharp-in-Unity.md
+++ b/docs/Using-TensorFlow-Sharp-in-Unity.md
 ## iOS additional instructions for building

 * Before build your game against iOS platform, make sure you've set the
-  flag `ENABLE_TENSORFLOW` for it. 
+  flag `ENABLE_TENSORFLOW` for it.
 * Once you build the project for iOS in the editor, open the .xcodeproj file
  within the project folder using Xcode.
 * Set up your ios account following the
--- a/ml-agents/README.md
+++ b/ml-agents/README.md
 game engine as well as a collection of trainers and algorithms to train agents
 in Unity environments.

-The `mlagents` Python package contains two sub packages: 
+The `mlagents` Python package contains two sub packages:
-* `mlagents.envs`: A low level API which
-allows you to interact directly with a Unity Environment. 
-See [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Python-API.md) for more information on using this package.
+* `mlagents.envs`: A low level API which allows you to interact directly with a
+  Unity Environment. See
+  [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Python-API.md)
+  for more information on using this package.
-* `mlagents.trainers`: A set of Reinforcement Learning 
-algorithms designed to be used with Unity environments. Access them using the 
-`mlagents-learn` access point. See [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-ML-Agents.md) for more information on using 
-this package.
+* `mlagents.trainers`: A set of Reinforcement Learning algorithms designed to be
+  used with Unity environments. Access them using the: `mlagents-learn` access
+  point. See
+  [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-ML-Agents.md)
+  for more information on using this package.

 ## Installation