Replacing training screenshots with updated markdown. (#1152)

* Replacing training screenshots with updated markdown. * Changing bash prompt to be simpler.
7 年前 · 423a6bc9
--- a/docs/Basic-Guide.md
+++ b/docs/Basic-Guide.md
 page](Learning-Environment-Executable.md) for instructions on how to build and
 use an executable.

-![Training command example](images/training-command-example.png)
+```console
+ml-agents$ mlagents-learn config/trainer_config.yaml --run-id=first-run --train
+
+
+                        ▄▄▄▓▓▓▓
+                   ╓▓▓▓▓▓▓█▓▓▓▓▓
+              ,▄▄▄m▀▀▀'  ,▓▓▓▀▓▓▄                           ▓▓▓  ▓▓▌
+            ▄▓▓▓▀'      ▄▓▓▀  ▓▓▓      ▄▄     ▄▄ ,▄▄ ▄▄▄▄   ,▄▄ ▄▓▓▌▄ ▄▄▄    ,▄▄
+          ▄▓▓▓▀        ▄▓▓▀   ▐▓▓▌     ▓▓▌   ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌  ╒▓▓▌
+        ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓      ▓▀      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌   ▐▓▓▄ ▓▓▌
+        ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄     ▓▓      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌    ▐▓▓▐▓▓
+          ^█▓▓▓        ▀▓▓▄   ▐▓▓▌     ▓▓▓▓▄▓▓▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▓▄    ▓▓▓▓`
+            '▀▓▓▓▄      ^▓▓▓  ▓▓▓       └▀▀▀▀ ▀▀ ^▀▀    `▀▀ `▀▀   '▀▀    ▐▓▓▌
+               ▀▀▀▀▓▄▄▄   ▓▓▓▓▓▓,                                      ▓▓▓▓▀
+                   `▀█▓▓▓▓▓▓▓▓▓▌
+                        ¬`▀▀▀█▓
+
+
+INFO:mlagents.learn:{'--curriculum': 'None',
+ '--docker-target-name': 'Empty',
+ '--env': 'None',
+ '--help': False,
+ '--keep-checkpoints': '5',
+ '--lesson': '0',
+ '--load': False,
+ '--no-graphics': False,
+ '--num-runs': '1',
+ '--run-id': 'first-run',
+ '--save-freq': '50000',
+ '--seed': '-1',
+ '--slow': False,
+ '--train': True,
+ '--worker-id': '0',
+ '<trainer-config-path>': 'config/trainer_config.yaml'}
+ ```

 **Note**: If you're using Anaconda, don't forget to activate the ml-agents
 environment first.

-![Training running](images/training-running.png)
+```console
+INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.
+INFO:mlagents.envs:
+'Ball3DAcademy' started successfully!
+Unity Academy name: Ball3DAcademy
+        Number of Brains: 1
+        Number of External Brains : 1
+        Reset Parameters :
+
+Unity brain name: Ball3DBrain
+        Number of Visual Observations (per agent): 0
+        Vector Observation space size (per agent): 8
+        Number of stacked Vector Observation: 1
+        Vector Action space type: continuous
+        Vector Action space size (per agent): [2]
+        Vector Action descriptions: ,
+INFO:mlagents.envs:Hyperparameters for the PPO Trainer of brain Ball3DBrain:
+        batch_size:          64
+        beta:                0.001
+        buffer_size:         12000
+        epsilon:             0.2
+        gamma:               0.995
+        hidden_units:        128
+        lambd:               0.99
+        learning_rate:       0.0003
+        max_steps:           5.0e4
+        normalize:           True
+        num_epoch:           3
+        num_layers:          2
+        time_horizon:        1000
+        sequence_length:     64
+        summary_freq:        1000
+        use_recurrent:       False
+        graph_scope:
+        summary_path:        ./summaries/first-run-0
+        memory_size:         256
+        use_curiosity:       False
+        curiosity_strength:  0.01
+        curiosity_enc_size:  128
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 4000. Mean Reward: 2.151. Std of Reward: 1.432. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 5000. Mean Reward: 3.175. Std of Reward: 2.250. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 6000. Mean Reward: 4.898. Std of Reward: 4.019. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 7000. Mean Reward: 6.716. Std of Reward: 5.125. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 8000. Mean Reward: 12.124. Std of Reward: 11.929. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 9000. Mean Reward: 18.151. Std of Reward: 16.871. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 10000. Mean Reward: 27.284. Std of Reward: 28.667. Training.
+```

 ### After training

--- a/docs/Learning-Environment-Executable.md
+++ b/docs/Learning-Environment-Executable.md
 ml-agents/python directory, run:

 ```sh
-mlagents-learn config/trainer_config.yaml --env=3DBall --run-id=firstRun --train
+mlagents-learn config/trainer_config.yaml --env=3DBall --run-id=first-run --train
-![Training command example](images/training-command-example.png)
+And you should see something like
+
+```console
+ml-agents$ mlagents-learn config/trainer_config.yaml --env=3DBall --run-id=first-run --train
+
+
+                        ▄▄▄▓▓▓▓
+                   ╓▓▓▓▓▓▓█▓▓▓▓▓
+              ,▄▄▄m▀▀▀'  ,▓▓▓▀▓▓▄                           ▓▓▓  ▓▓▌
+            ▄▓▓▓▀'      ▄▓▓▀  ▓▓▓      ▄▄     ▄▄ ,▄▄ ▄▄▄▄   ,▄▄ ▄▓▓▌▄ ▄▄▄    ,▄▄
+          ▄▓▓▓▀        ▄▓▓▀   ▐▓▓▌     ▓▓▌   ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌  ╒▓▓▌
+        ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓      ▓▀      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌   ▐▓▓▄ ▓▓▌
+        ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄     ▓▓      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌    ▐▓▓▐▓▓
+          ^█▓▓▓        ▀▓▓▄   ▐▓▓▌     ▓▓▓▓▄▓▓▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▓▄    ▓▓▓▓`
+            '▀▓▓▓▄      ^▓▓▓  ▓▓▓       └▀▀▀▀ ▀▀ ^▀▀    `▀▀ `▀▀   '▀▀    ▐▓▓▌
+               ▀▀▀▀▓▄▄▄   ▓▓▓▓▓▓,                                      ▓▓▓▓▀
+                   `▀█▓▓▓▓▓▓▓▓▓▌
+                        ¬`▀▀▀█▓
+
+
+INFO:mlagents.learn:{'--curriculum': 'None',
+ '--docker-target-name': 'Empty',
+ '--env': '3DBall',
+ '--help': False,
+ '--keep-checkpoints': '5',
+ '--lesson': '0',
+ '--load': False,
+ '--no-graphics': False,
+ '--num-runs': '1',
+ '--run-id': 'firstRun',
+ '--save-freq': '50000',
+ '--seed': '-1',
+ '--slow': False,
+ '--train': True,
+ '--worker-id': '0',
+ '<trainer-config-path>': 'config/trainer_config.yaml'}
+```

 **Note**: If you're using Anaconda, don't forget to activate the ml-agents
 environment first.

-![Training running](images/training-running.png)
+```console
+CrashReporter: initialized
+Mono path[0] = '/Users/dericp/workspace/ml-agents/3DBall.app/Contents/Resources/Data/Managed'
+Mono config path = '/Users/dericp/workspace/ml-agents/3DBall.app/Contents/MonoBleedingEdge/etc'
+INFO:mlagents.envs:
+'Ball3DAcademy' started successfully!
+INFO:mlagents.envs:
+'Ball3DAcademy' started successfully!
+Unity Academy name: Ball3DAcademy
+        Number of Brains: 1
+        Number of External Brains : 1
+        Reset Parameters :
+
+Unity brain name: Ball3DBrain
+        Number of Visual Observations (per agent): 0
+        Vector Observation space size (per agent): 8
+        Number of stacked Vector Observation: 1
+        Vector Action space type: continuous
+        Vector Action space size (per agent): [2]
+        Vector Action descriptions: ,
+INFO:mlagents.envs:Hyperparameters for the PPO Trainer of brain Ball3DBrain:
+        batch_size:          64
+        beta:                0.001
+        buffer_size:         12000
+        epsilon:             0.2
+        gamma:               0.995
+        hidden_units:        128
+        lambd:               0.99
+        learning_rate:       0.0003
+        max_steps:           5.0e4
+        normalize:           True
+        num_epoch:           3
+        num_layers:          2
+        time_horizon:        1000
+        sequence_length:     64
+        summary_freq:        1000
+        use_recurrent:       False
+        graph_scope:
+        summary_path:        ./summaries/first-run-0
+        memory_size:         256
+        use_curiosity:       False
+        curiosity_strength:  0.01
+        curiosity_enc_size:  128
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 4000. Mean Reward: 2.151. Std of Reward: 1.432. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 5000. Mean Reward: 3.175. Std of Reward: 2.250. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 6000. Mean Reward: 4.898. Std of Reward: 4.019. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 7000. Mean Reward: 6.716. Std of Reward: 5.125. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 8000. Mean Reward: 12.124. Std of Reward: 11.929. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 9000. Mean Reward: 18.151. Std of Reward: 16.871. Training.
+INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 10000. Mean Reward: 27.284. Std of Reward: 28.667. Training.
+```

 You can press Ctrl+C to stop the training, and your trained model will be at
 `models/<run-identifier>/<env_name>_<run-identifier>.bytes`, which corresponds
--- a/docs/Migrating.md
+++ b/docs/Migrating.md

 * In order to run a training session, you can now use the command
  `mlagents-learn` instead of `python3 learn.py` after installing the `mlagents`
-  packages. This change is documented [here](Training-ML-Agents.md#training-with-mlagents-learn).
+  packages. This change is documented
+  [here](Training-ML-Agents.md#training-with-mlagents-learn).
 * It is now required to specify the path to the yaml trainer configuration file
  when running `mlagents-learn`. For example, see
  [trainer_config.yaml](../config/trainer_config.yaml).
--- a/docs/images/training-command-example.png
+++ b/docs/images/training-command-example.png
--- a/docs/images/training-running.png
+++ b/docs/images/training-running.png