|
|
|
|
|
|
ml-agents/python directory, run: |
|
|
|
|
|
|
|
```sh |
|
|
|
mlagents-learn config/trainer_config.yaml --env=3DBall --run-id=firstRun --train |
|
|
|
mlagents-learn config/trainer_config.yaml --env=3DBall --run-id=first-run --train |
|
|
|
![Training command example](images/training-command-example.png) |
|
|
|
And you should see something like |
|
|
|
|
|
|
|
```console |
|
|
|
ml-agents$ mlagents-learn config/trainer_config.yaml --env=3DBall --run-id=first-run --train |
|
|
|
|
|
|
|
|
|
|
|
▄▄▄▓▓▓▓ |
|
|
|
╓▓▓▓▓▓▓█▓▓▓▓▓ |
|
|
|
,▄▄▄m▀▀▀' ,▓▓▓▀▓▓▄ ▓▓▓ ▓▓▌ |
|
|
|
▄▓▓▓▀' ▄▓▓▀ ▓▓▓ ▄▄ ▄▄ ,▄▄ ▄▄▄▄ ,▄▄ ▄▓▓▌▄ ▄▄▄ ,▄▄ |
|
|
|
▄▓▓▓▀ ▄▓▓▀ ▐▓▓▌ ▓▓▌ ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌ ╒▓▓▌ |
|
|
|
▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓ ▓▀ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▄ ▓▓▌ |
|
|
|
▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄ ▓▓ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▐▓▓ |
|
|
|
^█▓▓▓ ▀▓▓▄ ▐▓▓▌ ▓▓▓▓▄▓▓▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▓▄ ▓▓▓▓` |
|
|
|
'▀▓▓▓▄ ^▓▓▓ ▓▓▓ └▀▀▀▀ ▀▀ ^▀▀ `▀▀ `▀▀ '▀▀ ▐▓▓▌ |
|
|
|
▀▀▀▀▓▄▄▄ ▓▓▓▓▓▓, ▓▓▓▓▀ |
|
|
|
`▀█▓▓▓▓▓▓▓▓▓▌ |
|
|
|
¬`▀▀▀█▓ |
|
|
|
|
|
|
|
|
|
|
|
INFO:mlagents.learn:{'--curriculum': 'None', |
|
|
|
'--docker-target-name': 'Empty', |
|
|
|
'--env': '3DBall', |
|
|
|
'--help': False, |
|
|
|
'--keep-checkpoints': '5', |
|
|
|
'--lesson': '0', |
|
|
|
'--load': False, |
|
|
|
'--no-graphics': False, |
|
|
|
'--num-runs': '1', |
|
|
|
'--run-id': 'firstRun', |
|
|
|
'--save-freq': '50000', |
|
|
|
'--seed': '-1', |
|
|
|
'--slow': False, |
|
|
|
'--train': True, |
|
|
|
'--worker-id': '0', |
|
|
|
'<trainer-config-path>': 'config/trainer_config.yaml'} |
|
|
|
``` |
|
|
|
|
|
|
|
**Note**: If you're using Anaconda, don't forget to activate the ml-agents |
|
|
|
environment first. |
|
|
|
|
|
|
|
|
|
|
![Training running](images/training-running.png) |
|
|
|
```console |
|
|
|
CrashReporter: initialized |
|
|
|
Mono path[0] = '/Users/dericp/workspace/ml-agents/3DBall.app/Contents/Resources/Data/Managed' |
|
|
|
Mono config path = '/Users/dericp/workspace/ml-agents/3DBall.app/Contents/MonoBleedingEdge/etc' |
|
|
|
INFO:mlagents.envs: |
|
|
|
'Ball3DAcademy' started successfully! |
|
|
|
INFO:mlagents.envs: |
|
|
|
'Ball3DAcademy' started successfully! |
|
|
|
Unity Academy name: Ball3DAcademy |
|
|
|
Number of Brains: 1 |
|
|
|
Number of External Brains : 1 |
|
|
|
Reset Parameters : |
|
|
|
|
|
|
|
Unity brain name: Ball3DBrain |
|
|
|
Number of Visual Observations (per agent): 0 |
|
|
|
Vector Observation space size (per agent): 8 |
|
|
|
Number of stacked Vector Observation: 1 |
|
|
|
Vector Action space type: continuous |
|
|
|
Vector Action space size (per agent): [2] |
|
|
|
Vector Action descriptions: , |
|
|
|
INFO:mlagents.envs:Hyperparameters for the PPO Trainer of brain Ball3DBrain: |
|
|
|
batch_size: 64 |
|
|
|
beta: 0.001 |
|
|
|
buffer_size: 12000 |
|
|
|
epsilon: 0.2 |
|
|
|
gamma: 0.995 |
|
|
|
hidden_units: 128 |
|
|
|
lambd: 0.99 |
|
|
|
learning_rate: 0.0003 |
|
|
|
max_steps: 5.0e4 |
|
|
|
normalize: True |
|
|
|
num_epoch: 3 |
|
|
|
num_layers: 2 |
|
|
|
time_horizon: 1000 |
|
|
|
sequence_length: 64 |
|
|
|
summary_freq: 1000 |
|
|
|
use_recurrent: False |
|
|
|
graph_scope: |
|
|
|
summary_path: ./summaries/first-run-0 |
|
|
|
memory_size: 256 |
|
|
|
use_curiosity: False |
|
|
|
curiosity_strength: 0.01 |
|
|
|
curiosity_enc_size: 128 |
|
|
|
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training. |
|
|
|
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training. |
|
|
|
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training. |
|
|
|
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 4000. Mean Reward: 2.151. Std of Reward: 1.432. Training. |
|
|
|
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 5000. Mean Reward: 3.175. Std of Reward: 2.250. Training. |
|
|
|
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 6000. Mean Reward: 4.898. Std of Reward: 4.019. Training. |
|
|
|
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 7000. Mean Reward: 6.716. Std of Reward: 5.125. Training. |
|
|
|
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 8000. Mean Reward: 12.124. Std of Reward: 11.929. Training. |
|
|
|
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 9000. Mean Reward: 18.151. Std of Reward: 16.871. Training. |
|
|
|
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 10000. Mean Reward: 27.284. Std of Reward: 28.667. Training. |
|
|
|
``` |
|
|
|
|
|
|
|
You can press Ctrl+C to stop the training, and your trained model will be at |
|
|
|
`models/<run-identifier>/<env_name>_<run-identifier>.bytes`, which corresponds |
|
|
|