Unity 机器学习代理工具包 (ML-Agents) 是一个开源项目,它使游戏和模拟能够作为训练智能代理的环境。
您最多选择25个主题 主题必须以中文或者字母或数字开头,可以包含连字符 (-),并且长度不得超过35个字符
 
 
 
 
 

815 行
108 KiB

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Unity Technologies
Version information:
ml-agents: 0.18.0.dev0,
ml-agents-envs: 0.18.0.dev0,
Communicator API: 1.0.0,
TensorFlow: 2.2.0
2020-06-30 19:27:07 WARNING [learn.py:293] The --train option has been deprecated. Train mode is now the default. Use --inference to run in inference mode.
2020-06-30 19:27:13 INFO [environment.py:108] Connected to Unity environment with package version 1.1.0-preview and communication version 1.0.0
2020-06-30 19:27:13 INFO [environment.py:108] Connected to Unity environment with package version 1.1.0-preview and communication version 1.0.0
2020-06-30 19:27:13 INFO [environment.py:108] Connected to Unity environment with package version 1.1.0-preview and communication version 1.0.0
2020-06-30 19:27:13 INFO [environment.py:108] Connected to Unity environment with package version 1.1.0-preview and communication version 1.0.0
2020-06-30 19:27:13 INFO [environment.py:108] Connected to Unity environment with package version 1.1.0-preview and communication version 1.0.0
2020-06-30 19:27:13 INFO [environment.py:108] Connected to Unity environment with package version 1.1.0-preview and communication version 1.0.0
2020-06-30 19:27:13 INFO [environment.py:108] Connected to Unity environment with package version 1.1.0-preview and communication version 1.0.0
2020-06-30 19:27:13 INFO [environment.py:108] Connected to Unity environment with package version 1.1.0-preview and communication version 1.0.0
2020-06-30 19:27:13 INFO [environment.py:265] Connected new brain:
WalkerStatic?team=0
2020-06-30 19:27:13 INFO [environment.py:265] Connected new brain:
WalkerStatic?team=0
2020-06-30 19:27:13 INFO [environment.py:265] Connected new brain:
WalkerStatic?team=0
2020-06-30 19:27:13 INFO [environment.py:265] Connected new brain:
WalkerStatic?team=0
2020-06-30 19:27:13 INFO [environment.py:265] Connected new brain:
WalkerStatic?team=0
2020-06-30 19:27:13 INFO [environment.py:265] Connected new brain:
WalkerStatic?team=0
2020-06-30 19:27:13 INFO [environment.py:265] Connected new brain:
WalkerStatic?team=0
2020-06-30 19:27:13 INFO [environment.py:265] Connected new brain:
WalkerStatic?team=0
2020-06-30 19:27:13.315118: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-06-30 19:27:13.335366: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2000144999 Hz
2020-06-30 19:27:13.344789: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fb664000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-30 19:27:13.344830: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-06-30 19:27:13.347132: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-06-30 19:27:13.347164: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
2020-06-30 19:27:13.347197: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (job-brandonh-woldobst-wst-ppo-td52x): /proc/driver/nvidia/version does not exist
2020-06-30 19:27:13 INFO [stats.py:130] Hyperparameters for behavior name WalkerStatic:
trainer_type: ppo
hyperparameters:
batch_size: 2048
buffer_size: 20480
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: True
hidden_units: 512
num_layers: 3
vis_encode_type: simple
memory: None
reward_signals:
extrinsic:
gamma: 0.995
strength: 1.0
init_path: None
keep_checkpoints: 5
checkpoint_interval: 500000
max_steps: 20000000
time_horizon: 1000
summary_freq: 30000
threaded: True
self_play: None
behavioral_cloning: None
2020-06-30 19:27:42 INFO [stats.py:111] WalkerStatic: Step: 30000. Time Elapsed: 34.738 s Mean Reward: 1.978. Std of Reward: 2.314. Training.
2020-06-30 19:28:09 INFO [stats.py:111] WalkerStatic: Step: 60000. Time Elapsed: 61.963 s Mean Reward: 2.194. Std of Reward: 2.302. Training.
2020-06-30 19:28:39 INFO [stats.py:111] WalkerStatic: Step: 90000. Time Elapsed: 92.525 s Mean Reward: 2.455. Std of Reward: 2.283. Training.
2020-06-30 19:29:06 INFO [stats.py:111] WalkerStatic: Step: 120000. Time Elapsed: 119.208 s Mean Reward: 2.724. Std of Reward: 2.283. Training.
2020-06-30 19:29:36 INFO [stats.py:111] WalkerStatic: Step: 150000. Time Elapsed: 149.442 s Mean Reward: 2.924. Std of Reward: 2.233. Training.
2020-06-30 19:30:03 INFO [stats.py:111] WalkerStatic: Step: 180000. Time Elapsed: 176.157 s Mean Reward: 3.217. Std of Reward: 2.171. Training.
2020-06-30 19:30:33 INFO [stats.py:111] WalkerStatic: Step: 210000. Time Elapsed: 206.360 s Mean Reward: 3.385. Std of Reward: 2.013. Training.
2020-06-30 19:31:00 INFO [stats.py:111] WalkerStatic: Step: 240000. Time Elapsed: 233.122 s Mean Reward: 3.657. Std of Reward: 2.041. Training.
2020-06-30 19:31:30 INFO [stats.py:111] WalkerStatic: Step: 270000. Time Elapsed: 263.222 s Mean Reward: 3.735. Std of Reward: 2.043. Training.
2020-06-30 19:31:57 INFO [stats.py:111] WalkerStatic: Step: 300000. Time Elapsed: 289.884 s Mean Reward: 3.901. Std of Reward: 2.024. Training.
2020-06-30 19:32:28 INFO [stats.py:111] WalkerStatic: Step: 330000. Time Elapsed: 321.224 s Mean Reward: 3.977. Std of Reward: 1.960. Training.
2020-06-30 19:32:55 INFO [stats.py:111] WalkerStatic: Step: 360000. Time Elapsed: 348.048 s Mean Reward: 4.153. Std of Reward: 1.980. Training.
2020-06-30 19:33:25 INFO [stats.py:111] WalkerStatic: Step: 390000. Time Elapsed: 378.544 s Mean Reward: 4.261. Std of Reward: 2.031. Training.
2020-06-30 19:33:52 INFO [stats.py:111] WalkerStatic: Step: 420000. Time Elapsed: 405.147 s Mean Reward: 4.413. Std of Reward: 2.060. Training.
2020-06-30 19:34:19 INFO [stats.py:111] WalkerStatic: Step: 450000. Time Elapsed: 431.658 s Mean Reward: 4.611. Std of Reward: 2.034. Training.
2020-06-30 19:34:49 INFO [stats.py:111] WalkerStatic: Step: 480000. Time Elapsed: 462.127 s Mean Reward: 4.672. Std of Reward: 2.171. Training.
2020-06-30 19:35:08 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 19:35:16 INFO [stats.py:111] WalkerStatic: Step: 510000. Time Elapsed: 488.640 s Mean Reward: 4.954. Std of Reward: 2.174. Training.
2020-06-30 19:35:45 INFO [stats.py:111] WalkerStatic: Step: 540000. Time Elapsed: 518.436 s Mean Reward: 5.124. Std of Reward: 2.141. Training.
2020-06-30 19:36:12 INFO [stats.py:111] WalkerStatic: Step: 570000. Time Elapsed: 544.826 s Mean Reward: 5.191. Std of Reward: 2.279. Training.
2020-06-30 19:36:42 INFO [stats.py:111] WalkerStatic: Step: 600000. Time Elapsed: 574.617 s Mean Reward: 5.465. Std of Reward: 2.233. Training.
2020-06-30 19:37:08 INFO [stats.py:111] WalkerStatic: Step: 630000. Time Elapsed: 600.969 s Mean Reward: 5.751. Std of Reward: 2.336. Training.
2020-06-30 19:37:39 INFO [stats.py:111] WalkerStatic: Step: 660000. Time Elapsed: 631.868 s Mean Reward: 5.860. Std of Reward: 2.425. Training.
2020-06-30 19:38:05 INFO [stats.py:111] WalkerStatic: Step: 690000. Time Elapsed: 658.448 s Mean Reward: 6.011. Std of Reward: 2.506. Training.
2020-06-30 19:38:35 INFO [stats.py:111] WalkerStatic: Step: 720000. Time Elapsed: 688.112 s Mean Reward: 6.344. Std of Reward: 2.755. Training.
2020-06-30 19:39:01 INFO [stats.py:111] WalkerStatic: Step: 750000. Time Elapsed: 714.452 s Mean Reward: 6.734. Std of Reward: 2.865. Training.
2020-06-30 19:39:31 INFO [stats.py:111] WalkerStatic: Step: 780000. Time Elapsed: 744.447 s Mean Reward: 7.073. Std of Reward: 2.995. Training.
2020-06-30 19:39:58 INFO [stats.py:111] WalkerStatic: Step: 810000. Time Elapsed: 771.220 s Mean Reward: 7.283. Std of Reward: 3.245. Training.
2020-06-30 19:40:24 INFO [stats.py:111] WalkerStatic: Step: 840000. Time Elapsed: 797.608 s Mean Reward: 7.633. Std of Reward: 3.371. Training.
2020-06-30 19:41:00 INFO [stats.py:111] WalkerStatic: Step: 870000. Time Elapsed: 832.908 s Mean Reward: 7.869. Std of Reward: 3.600. Training.
2020-06-30 19:41:26 INFO [stats.py:111] WalkerStatic: Step: 900000. Time Elapsed: 859.225 s Mean Reward: 8.514. Std of Reward: 4.200. Training.
2020-06-30 19:41:56 INFO [stats.py:111] WalkerStatic: Step: 930000. Time Elapsed: 888.732 s Mean Reward: 9.210. Std of Reward: 4.421. Training.
2020-06-30 19:42:22 INFO [stats.py:111] WalkerStatic: Step: 960000. Time Elapsed: 914.933 s Mean Reward: 9.415. Std of Reward: 4.683. Training.
2020-06-30 19:42:52 INFO [stats.py:111] WalkerStatic: Step: 990000. Time Elapsed: 945.358 s Mean Reward: 10.107. Std of Reward: 5.419. Training.
2020-06-30 19:43:00 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 19:43:19 INFO [stats.py:111] WalkerStatic: Step: 1020000. Time Elapsed: 972.048 s Mean Reward: 10.510. Std of Reward: 5.874. Training.
2020-06-30 19:43:49 INFO [stats.py:111] WalkerStatic: Step: 1050000. Time Elapsed: 1001.823 s Mean Reward: 10.883. Std of Reward: 5.983. Training.
2020-06-30 19:44:15 INFO [stats.py:111] WalkerStatic: Step: 1080000. Time Elapsed: 1027.956 s Mean Reward: 11.509. Std of Reward: 6.569. Training.
2020-06-30 19:44:45 INFO [stats.py:111] WalkerStatic: Step: 1110000. Time Elapsed: 1058.031 s Mean Reward: 12.118. Std of Reward: 6.920. Training.
2020-06-30 19:45:11 INFO [stats.py:111] WalkerStatic: Step: 1140000. Time Elapsed: 1084.251 s Mean Reward: 13.037. Std of Reward: 8.162. Training.
2020-06-30 19:45:41 INFO [stats.py:111] WalkerStatic: Step: 1170000. Time Elapsed: 1114.130 s Mean Reward: 13.550. Std of Reward: 8.023. Training.
2020-06-30 19:46:07 INFO [stats.py:111] WalkerStatic: Step: 1200000. Time Elapsed: 1140.268 s Mean Reward: 13.949. Std of Reward: 8.838. Training.
2020-06-30 19:46:33 INFO [stats.py:111] WalkerStatic: Step: 1230000. Time Elapsed: 1166.440 s Mean Reward: 14.876. Std of Reward: 9.734. Training.
2020-06-30 19:47:03 INFO [stats.py:111] WalkerStatic: Step: 1260000. Time Elapsed: 1196.204 s Mean Reward: 16.681. Std of Reward: 10.371. Training.
2020-06-30 19:47:29 INFO [stats.py:111] WalkerStatic: Step: 1290000. Time Elapsed: 1222.427 s Mean Reward: 16.768. Std of Reward: 11.312. Training.
2020-06-30 19:48:00 INFO [stats.py:111] WalkerStatic: Step: 1320000. Time Elapsed: 1253.404 s Mean Reward: 16.467. Std of Reward: 10.997. Training.
2020-06-30 19:48:26 INFO [stats.py:111] WalkerStatic: Step: 1350000. Time Elapsed: 1278.845 s Mean Reward: 17.741. Std of Reward: 12.678. Training.
2020-06-30 19:48:55 INFO [stats.py:111] WalkerStatic: Step: 1380000. Time Elapsed: 1307.954 s Mean Reward: 19.410. Std of Reward: 13.895. Training.
2020-06-30 19:49:21 INFO [stats.py:111] WalkerStatic: Step: 1410000. Time Elapsed: 1334.116 s Mean Reward: 20.186. Std of Reward: 13.687. Training.
2020-06-30 19:49:51 INFO [stats.py:111] WalkerStatic: Step: 1440000. Time Elapsed: 1363.763 s Mean Reward: 22.083. Std of Reward: 16.249. Training.
2020-06-30 19:50:17 INFO [stats.py:111] WalkerStatic: Step: 1470000. Time Elapsed: 1390.014 s Mean Reward: 22.977. Std of Reward: 17.124. Training.
2020-06-30 19:50:46 INFO [stats.py:111] WalkerStatic: Step: 1500000. Time Elapsed: 1419.495 s Mean Reward: 23.849. Std of Reward: 18.422. Training.
2020-06-30 19:50:46 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 19:51:14 INFO [stats.py:111] WalkerStatic: Step: 1530000. Time Elapsed: 1446.689 s Mean Reward: 26.294. Std of Reward: 19.475. Training.
2020-06-30 19:51:43 INFO [stats.py:111] WalkerStatic: Step: 1560000. Time Elapsed: 1475.991 s Mean Reward: 29.422. Std of Reward: 22.363. Training.
2020-06-30 19:52:09 INFO [stats.py:111] WalkerStatic: Step: 1590000. Time Elapsed: 1502.015 s Mean Reward: 29.504. Std of Reward: 23.433. Training.
2020-06-30 19:52:35 INFO [stats.py:111] WalkerStatic: Step: 1620000. Time Elapsed: 1527.653 s Mean Reward: 36.097. Std of Reward: 29.048. Training.
2020-06-30 19:53:05 INFO [stats.py:111] WalkerStatic: Step: 1650000. Time Elapsed: 1558.339 s Mean Reward: 34.160. Std of Reward: 28.837. Training.
2020-06-30 19:53:31 INFO [stats.py:111] WalkerStatic: Step: 1680000. Time Elapsed: 1583.709 s Mean Reward: 38.897. Std of Reward: 31.709. Training.
2020-06-30 19:54:00 INFO [stats.py:111] WalkerStatic: Step: 1710000. Time Elapsed: 1613.559 s Mean Reward: 37.088. Std of Reward: 32.600. Training.
2020-06-30 19:54:26 INFO [stats.py:111] WalkerStatic: Step: 1740000. Time Elapsed: 1639.534 s Mean Reward: 42.410. Std of Reward: 35.296. Training.
2020-06-30 19:54:57 INFO [stats.py:111] WalkerStatic: Step: 1770000. Time Elapsed: 1670.435 s Mean Reward: 44.661. Std of Reward: 37.505. Training.
2020-06-30 19:55:23 INFO [stats.py:111] WalkerStatic: Step: 1800000. Time Elapsed: 1696.340 s Mean Reward: 45.880. Std of Reward: 40.187. Training.
2020-06-30 19:55:53 INFO [stats.py:111] WalkerStatic: Step: 1830000. Time Elapsed: 1726.196 s Mean Reward: 54.174. Std of Reward: 46.083. Training.
2020-06-30 19:56:19 INFO [stats.py:111] WalkerStatic: Step: 1860000. Time Elapsed: 1751.757 s Mean Reward: 61.143. Std of Reward: 50.394. Training.
2020-06-30 19:56:49 INFO [stats.py:111] WalkerStatic: Step: 1890000. Time Elapsed: 1782.307 s Mean Reward: 57.075. Std of Reward: 53.276. Training.
2020-06-30 19:57:15 INFO [stats.py:111] WalkerStatic: Step: 1920000. Time Elapsed: 1808.233 s Mean Reward: 67.421. Std of Reward: 61.448. Training.
2020-06-30 19:57:45 INFO [stats.py:111] WalkerStatic: Step: 1950000. Time Elapsed: 1838.153 s Mean Reward: 67.331. Std of Reward: 64.187. Training.
2020-06-30 19:58:11 INFO [stats.py:111] WalkerStatic: Step: 1980000. Time Elapsed: 1863.640 s Mean Reward: 74.886. Std of Reward: 73.696. Training.
2020-06-30 19:58:30 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 19:58:37 INFO [stats.py:111] WalkerStatic: Step: 2010000. Time Elapsed: 1890.341 s Mean Reward: 82.776. Std of Reward: 74.078. Training.
2020-06-30 19:59:06 INFO [stats.py:111] WalkerStatic: Step: 2040000. Time Elapsed: 1919.592 s Mean Reward: 101.606. Std of Reward: 97.906. Training.
2020-06-30 19:59:32 INFO [stats.py:111] WalkerStatic: Step: 2070000. Time Elapsed: 1944.627 s Mean Reward: 109.852. Std of Reward: 102.757. Training.
2020-06-30 19:59:59 INFO [stats.py:111] WalkerStatic: Step: 2100000. Time Elapsed: 1972.061 s Mean Reward: 104.378. Std of Reward: 92.640. Training.
2020-06-30 20:00:27 INFO [stats.py:111] WalkerStatic: Step: 2130000. Time Elapsed: 2000.090 s Mean Reward: 104.704. Std of Reward: 95.021. Training.
2020-06-30 20:00:56 INFO [stats.py:111] WalkerStatic: Step: 2160000. Time Elapsed: 2028.748 s Mean Reward: 120.102. Std of Reward: 114.455. Training.
2020-06-30 20:01:21 INFO [stats.py:111] WalkerStatic: Step: 2190000. Time Elapsed: 2054.356 s Mean Reward: 118.264. Std of Reward: 103.865. Training.
2020-06-30 20:01:53 INFO [stats.py:111] WalkerStatic: Step: 2220000. Time Elapsed: 2086.317 s Mean Reward: 131.923. Std of Reward: 122.951. Training.
2020-06-30 20:02:14 INFO [stats.py:111] WalkerStatic: Step: 2250000. Time Elapsed: 2107.517 s Mean Reward: 146.586. Std of Reward: 148.745. Training.
2020-06-30 20:02:48 INFO [stats.py:111] WalkerStatic: Step: 2280000. Time Elapsed: 2140.615 s Mean Reward: 150.572. Std of Reward: 121.779. Training.
2020-06-30 20:03:09 INFO [stats.py:111] WalkerStatic: Step: 2310000. Time Elapsed: 2161.819 s Mean Reward: 161.319. Std of Reward: 153.468. Training.
2020-06-30 20:03:39 INFO [stats.py:111] WalkerStatic: Step: 2340000. Time Elapsed: 2192.265 s Mean Reward: 166.215. Std of Reward: 161.677. Training.
2020-06-30 20:04:06 INFO [stats.py:111] WalkerStatic: Step: 2370000. Time Elapsed: 2219.356 s Mean Reward: 177.255. Std of Reward: 156.756. Training.
2020-06-30 20:04:34 INFO [stats.py:111] WalkerStatic: Step: 2400000. Time Elapsed: 2246.868 s Mean Reward: 195.038. Std of Reward: 184.191. Training.
2020-06-30 20:05:03 INFO [stats.py:111] WalkerStatic: Step: 2430000. Time Elapsed: 2276.069 s Mean Reward: 191.869. Std of Reward: 184.492. Training.
2020-06-30 20:05:29 INFO [stats.py:111] WalkerStatic: Step: 2460000. Time Elapsed: 2301.838 s Mean Reward: 223.819. Std of Reward: 194.444. Training.
2020-06-30 20:05:58 INFO [stats.py:111] WalkerStatic: Step: 2490000. Time Elapsed: 2331.011 s Mean Reward: 253.198. Std of Reward: 228.213. Training.
2020-06-30 20:06:05 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 20:06:25 INFO [stats.py:111] WalkerStatic: Step: 2520000. Time Elapsed: 2357.805 s Mean Reward: 248.790. Std of Reward: 203.701. Training.
2020-06-30 20:06:52 INFO [stats.py:111] WalkerStatic: Step: 2550000. Time Elapsed: 2384.871 s Mean Reward: 295.692. Std of Reward: 252.496. Training.
2020-06-30 20:07:16 INFO [stats.py:111] WalkerStatic: Step: 2580000. Time Elapsed: 2409.551 s Mean Reward: 255.902. Std of Reward: 222.713. Training.
2020-06-30 20:07:42 INFO [stats.py:111] WalkerStatic: Step: 2610000. Time Elapsed: 2435.492 s Mean Reward: 227.317. Std of Reward: 219.841. Training.
2020-06-30 20:08:10 INFO [stats.py:111] WalkerStatic: Step: 2640000. Time Elapsed: 2462.953 s Mean Reward: 276.052. Std of Reward: 227.341. Training.
2020-06-30 20:08:38 INFO [stats.py:111] WalkerStatic: Step: 2670000. Time Elapsed: 2490.633 s Mean Reward: 290.838. Std of Reward: 242.330. Training.
2020-06-30 20:09:08 INFO [stats.py:111] WalkerStatic: Step: 2700000. Time Elapsed: 2520.914 s Mean Reward: 311.639. Std of Reward: 255.585. Training.
2020-06-30 20:09:33 INFO [stats.py:111] WalkerStatic: Step: 2730000. Time Elapsed: 2545.924 s Mean Reward: 292.821. Std of Reward: 240.725. Training.
2020-06-30 20:10:00 INFO [stats.py:111] WalkerStatic: Step: 2760000. Time Elapsed: 2572.723 s Mean Reward: 296.539. Std of Reward: 288.399. Training.
2020-06-30 20:10:26 INFO [stats.py:111] WalkerStatic: Step: 2790000. Time Elapsed: 2599.492 s Mean Reward: 302.853. Std of Reward: 251.559. Training.
2020-06-30 20:10:56 INFO [stats.py:111] WalkerStatic: Step: 2820000. Time Elapsed: 2629.234 s Mean Reward: 351.301. Std of Reward: 273.091. Training.
2020-06-30 20:11:23 INFO [stats.py:111] WalkerStatic: Step: 2850000. Time Elapsed: 2655.710 s Mean Reward: 386.802. Std of Reward: 261.570. Training.
2020-06-30 20:11:57 INFO [stats.py:111] WalkerStatic: Step: 2880000. Time Elapsed: 2690.180 s Mean Reward: 428.035. Std of Reward: 316.296. Training.
2020-06-30 20:12:17 INFO [stats.py:111] WalkerStatic: Step: 2910000. Time Elapsed: 2709.813 s Mean Reward: 463.855. Std of Reward: 336.803. Training.
2020-06-30 20:12:39 INFO [stats.py:111] WalkerStatic: Step: 2940000. Time Elapsed: 2732.418 s Mean Reward: 455.670. Std of Reward: 342.691. Training.
2020-06-30 20:13:14 INFO [stats.py:111] WalkerStatic: Step: 2970000. Time Elapsed: 2766.916 s Mean Reward: 445.434. Std of Reward: 299.043. Training.
2020-06-30 20:13:39 INFO [stats.py:111] WalkerStatic: Step: 3000000. Time Elapsed: 2792.214 s Mean Reward: 537.432. Std of Reward: 339.999. Training.
2020-06-30 20:13:39 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 20:14:05 INFO [stats.py:111] WalkerStatic: Step: 3030000. Time Elapsed: 2817.664 s Mean Reward: 474.982. Std of Reward: 348.173. Training.
2020-06-30 20:14:33 INFO [stats.py:111] WalkerStatic: Step: 3060000. Time Elapsed: 2846.009 s Mean Reward: 579.587. Std of Reward: 295.912. Training.
2020-06-30 20:15:01 INFO [stats.py:111] WalkerStatic: Step: 3090000. Time Elapsed: 2874.240 s Mean Reward: 606.422. Std of Reward: 328.595. Training.
2020-06-30 20:15:26 INFO [stats.py:111] WalkerStatic: Step: 3120000. Time Elapsed: 2898.819 s Mean Reward: 506.874. Std of Reward: 358.808. Training.
2020-06-30 20:15:54 INFO [stats.py:111] WalkerStatic: Step: 3150000. Time Elapsed: 2927.051 s Mean Reward: 606.362. Std of Reward: 355.151. Training.
2020-06-30 20:16:20 INFO [stats.py:111] WalkerStatic: Step: 3180000. Time Elapsed: 2953.249 s Mean Reward: 615.864. Std of Reward: 375.788. Training.
2020-06-30 20:16:46 INFO [stats.py:111] WalkerStatic: Step: 3210000. Time Elapsed: 2979.142 s Mean Reward: 667.120. Std of Reward: 336.440. Training.
2020-06-30 20:17:13 INFO [stats.py:111] WalkerStatic: Step: 3240000. Time Elapsed: 3005.902 s Mean Reward: 624.549. Std of Reward: 340.657. Training.
2020-06-30 20:17:41 INFO [stats.py:111] WalkerStatic: Step: 3270000. Time Elapsed: 3034.527 s Mean Reward: 631.965. Std of Reward: 370.518. Training.
2020-06-30 20:18:10 INFO [stats.py:111] WalkerStatic: Step: 3300000. Time Elapsed: 3063.244 s Mean Reward: 632.884. Std of Reward: 379.162. Training.
2020-06-30 20:18:35 INFO [stats.py:111] WalkerStatic: Step: 3330000. Time Elapsed: 3087.892 s Mean Reward: 649.428. Std of Reward: 389.726. Training.
2020-06-30 20:19:01 INFO [stats.py:111] WalkerStatic: Step: 3360000. Time Elapsed: 3114.240 s Mean Reward: 715.976. Std of Reward: 349.535. Training.
2020-06-30 20:19:29 INFO [stats.py:111] WalkerStatic: Step: 3390000. Time Elapsed: 3142.595 s Mean Reward: 705.082. Std of Reward: 368.659. Training.
2020-06-30 20:19:53 INFO [stats.py:111] WalkerStatic: Step: 3420000. Time Elapsed: 3165.891 s Mean Reward: 808.040. Std of Reward: 266.790. Training.
2020-06-30 20:20:20 INFO [stats.py:111] WalkerStatic: Step: 3450000. Time Elapsed: 3193.585 s Mean Reward: 754.114. Std of Reward: 308.316. Training.
2020-06-30 20:20:48 INFO [stats.py:111] WalkerStatic: Step: 3480000. Time Elapsed: 3220.759 s Mean Reward: 786.721. Std of Reward: 339.834. Training.
2020-06-30 20:21:05 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 20:21:18 INFO [stats.py:111] WalkerStatic: Step: 3510000. Time Elapsed: 3251.295 s Mean Reward: 631.566. Std of Reward: 398.092. Training.
2020-06-30 20:21:42 INFO [stats.py:111] WalkerStatic: Step: 3540000. Time Elapsed: 3274.642 s Mean Reward: 786.503. Std of Reward: 349.192. Training.
2020-06-30 20:22:12 INFO [stats.py:111] WalkerStatic: Step: 3570000. Time Elapsed: 3305.185 s Mean Reward: 813.445. Std of Reward: 318.274. Training.
2020-06-30 20:22:34 INFO [stats.py:111] WalkerStatic: Step: 3600000. Time Elapsed: 3327.300 s Mean Reward: 758.293. Std of Reward: 366.304. Training.
2020-06-30 20:23:01 INFO [stats.py:111] WalkerStatic: Step: 3630000. Time Elapsed: 3354.130 s Mean Reward: 748.752. Std of Reward: 394.922. Training.
2020-06-30 20:23:28 INFO [stats.py:111] WalkerStatic: Step: 3660000. Time Elapsed: 3380.853 s Mean Reward: 715.293. Std of Reward: 415.671. Training.
2020-06-30 20:23:57 INFO [stats.py:111] WalkerStatic: Step: 3690000. Time Elapsed: 3409.906 s Mean Reward: 769.998. Std of Reward: 380.584. Training.
2020-06-30 20:24:22 INFO [stats.py:111] WalkerStatic: Step: 3720000. Time Elapsed: 3435.461 s Mean Reward: 803.616. Std of Reward: 382.603. Training.
2020-06-30 20:24:48 INFO [stats.py:111] WalkerStatic: Step: 3750000. Time Elapsed: 3461.570 s Mean Reward: 848.373. Std of Reward: 336.791. Training.
2020-06-30 20:25:18 INFO [stats.py:111] WalkerStatic: Step: 3780000. Time Elapsed: 3491.443 s Mean Reward: 795.140. Std of Reward: 381.411. Training.
2020-06-30 20:25:43 INFO [stats.py:111] WalkerStatic: Step: 3810000. Time Elapsed: 3516.026 s Mean Reward: 767.279. Std of Reward: 395.942. Training.
2020-06-30 20:26:07 INFO [stats.py:111] WalkerStatic: Step: 3840000. Time Elapsed: 3539.799 s Mean Reward: 827.257. Std of Reward: 357.127. Training.
2020-06-30 20:26:38 INFO [stats.py:111] WalkerStatic: Step: 3870000. Time Elapsed: 3570.969 s Mean Reward: 803.220. Std of Reward: 416.555. Training.
2020-06-30 20:27:00 INFO [stats.py:111] WalkerStatic: Step: 3900000. Time Elapsed: 3592.923 s Mean Reward: 741.874. Std of Reward: 420.047. Training.
2020-06-30 20:27:29 INFO [stats.py:111] WalkerStatic: Step: 3930000. Time Elapsed: 3622.503 s Mean Reward: 698.621. Std of Reward: 474.788. Training.
2020-06-30 20:27:57 INFO [stats.py:111] WalkerStatic: Step: 3960000. Time Elapsed: 3650.314 s Mean Reward: 692.519. Std of Reward: 471.561. Training.
2020-06-30 20:28:24 INFO [stats.py:111] WalkerStatic: Step: 3990000. Time Elapsed: 3677.127 s Mean Reward: 789.729. Std of Reward: 461.033. Training.
2020-06-30 20:28:31 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 20:28:50 INFO [stats.py:111] WalkerStatic: Step: 4020000. Time Elapsed: 3702.990 s Mean Reward: 708.060. Std of Reward: 467.209. Training.
2020-06-30 20:29:17 INFO [stats.py:111] WalkerStatic: Step: 4050000. Time Elapsed: 3729.725 s Mean Reward: 916.875. Std of Reward: 380.589. Training.
2020-06-30 20:29:43 INFO [stats.py:111] WalkerStatic: Step: 4080000. Time Elapsed: 3756.606 s Mean Reward: 934.349. Std of Reward: 361.608. Training.
2020-06-30 20:30:12 INFO [stats.py:111] WalkerStatic: Step: 4110000. Time Elapsed: 3785.230 s Mean Reward: 848.461. Std of Reward: 389.001. Training.
2020-06-30 20:30:41 INFO [stats.py:111] WalkerStatic: Step: 4140000. Time Elapsed: 3814.025 s Mean Reward: 795.601. Std of Reward: 482.101. Training.
2020-06-30 20:31:03 INFO [stats.py:111] WalkerStatic: Step: 4170000. Time Elapsed: 3836.219 s Mean Reward: 816.421. Std of Reward: 462.591. Training.
2020-06-30 20:31:30 INFO [stats.py:111] WalkerStatic: Step: 4200000. Time Elapsed: 3863.010 s Mean Reward: 749.915. Std of Reward: 468.740. Training.
2020-06-30 20:31:56 INFO [stats.py:111] WalkerStatic: Step: 4230000. Time Elapsed: 3889.303 s Mean Reward: 849.883. Std of Reward: 439.512. Training.
2020-06-30 20:32:25 INFO [stats.py:111] WalkerStatic: Step: 4260000. Time Elapsed: 3918.360 s Mean Reward: 993.627. Std of Reward: 348.000. Training.
2020-06-30 20:32:54 INFO [stats.py:111] WalkerStatic: Step: 4290000. Time Elapsed: 3946.704 s Mean Reward: 876.909. Std of Reward: 442.489. Training.
2020-06-30 20:33:16 INFO [stats.py:111] WalkerStatic: Step: 4320000. Time Elapsed: 3969.391 s Mean Reward: 714.736. Std of Reward: 524.187. Training.
2020-06-30 20:33:48 INFO [stats.py:111] WalkerStatic: Step: 4350000. Time Elapsed: 4000.673 s Mean Reward: 937.440. Std of Reward: 426.157. Training.
2020-06-30 20:34:13 INFO [stats.py:111] WalkerStatic: Step: 4380000. Time Elapsed: 4025.847 s Mean Reward: 915.237. Std of Reward: 457.031. Training.
2020-06-30 20:34:35 INFO [stats.py:111] WalkerStatic: Step: 4410000. Time Elapsed: 4048.021 s Mean Reward: 919.871. Std of Reward: 428.350. Training.
2020-06-30 20:35:05 INFO [stats.py:111] WalkerStatic: Step: 4440000. Time Elapsed: 4077.955 s Mean Reward: 854.516. Std of Reward: 449.379. Training.
2020-06-30 20:35:34 INFO [stats.py:111] WalkerStatic: Step: 4470000. Time Elapsed: 4106.908 s Mean Reward: 842.974. Std of Reward: 464.013. Training.
2020-06-30 20:36:01 INFO [stats.py:111] WalkerStatic: Step: 4500000. Time Elapsed: 4134.062 s Mean Reward: 899.301. Std of Reward: 477.660. Training.
2020-06-30 20:36:01 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 20:36:25 INFO [stats.py:111] WalkerStatic: Step: 4530000. Time Elapsed: 4158.267 s Mean Reward: 902.117. Std of Reward: 465.500. Training.
2020-06-30 20:36:55 INFO [stats.py:111] WalkerStatic: Step: 4560000. Time Elapsed: 4188.583 s Mean Reward: 837.099. Std of Reward: 512.324. Training.
2020-06-30 20:37:20 INFO [stats.py:111] WalkerStatic: Step: 4590000. Time Elapsed: 4212.896 s Mean Reward: 834.240. Std of Reward: 483.065. Training.
2020-06-30 20:37:46 INFO [stats.py:111] WalkerStatic: Step: 4620000. Time Elapsed: 4239.420 s Mean Reward: 830.825. Std of Reward: 507.058. Training.
2020-06-30 20:38:14 INFO [stats.py:111] WalkerStatic: Step: 4650000. Time Elapsed: 4266.911 s Mean Reward: 796.138. Std of Reward: 526.729. Training.
2020-06-30 20:38:37 INFO [stats.py:111] WalkerStatic: Step: 4680000. Time Elapsed: 4290.398 s Mean Reward: 724.706. Std of Reward: 550.408. Training.
2020-06-30 20:39:10 INFO [stats.py:111] WalkerStatic: Step: 4710000. Time Elapsed: 4323.076 s Mean Reward: 939.907. Std of Reward: 444.181. Training.
2020-06-30 20:39:35 INFO [stats.py:111] WalkerStatic: Step: 4740000. Time Elapsed: 4347.755 s Mean Reward: 678.427. Std of Reward: 548.807. Training.
2020-06-30 20:40:01 INFO [stats.py:111] WalkerStatic: Step: 4770000. Time Elapsed: 4374.081 s Mean Reward: 901.906. Std of Reward: 467.594. Training.
2020-06-30 20:40:29 INFO [stats.py:111] WalkerStatic: Step: 4800000. Time Elapsed: 4402.485 s Mean Reward: 805.021. Std of Reward: 506.840. Training.
2020-06-30 20:40:55 INFO [stats.py:111] WalkerStatic: Step: 4830000. Time Elapsed: 4428.283 s Mean Reward: 867.265. Std of Reward: 489.803. Training.
2020-06-30 20:41:23 INFO [stats.py:111] WalkerStatic: Step: 4860000. Time Elapsed: 4456.516 s Mean Reward: 838.652. Std of Reward: 493.995. Training.
2020-06-30 20:41:49 INFO [stats.py:111] WalkerStatic: Step: 4890000. Time Elapsed: 4481.667 s Mean Reward: 960.056. Std of Reward: 407.094. Training.
2020-06-30 20:42:17 INFO [stats.py:111] WalkerStatic: Step: 4920000. Time Elapsed: 4510.155 s Mean Reward: 854.624. Std of Reward: 531.717. Training.
2020-06-30 20:42:43 INFO [stats.py:111] WalkerStatic: Step: 4950000. Time Elapsed: 4536.407 s Mean Reward: 985.417. Std of Reward: 409.951. Training.
2020-06-30 20:43:14 INFO [stats.py:111] WalkerStatic: Step: 4980000. Time Elapsed: 4566.853 s Mean Reward: 917.940. Std of Reward: 508.176. Training.
2020-06-30 20:43:28 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 20:43:36 INFO [stats.py:111] WalkerStatic: Step: 5010000. Time Elapsed: 4588.724 s Mean Reward: 884.631. Std of Reward: 481.680. Training.
2020-06-30 20:44:09 INFO [stats.py:111] WalkerStatic: Step: 5040000. Time Elapsed: 4622.187 s Mean Reward: 986.016. Std of Reward: 467.681. Training.
2020-06-30 20:44:33 INFO [stats.py:111] WalkerStatic: Step: 5070000. Time Elapsed: 4645.723 s Mean Reward: 986.114. Std of Reward: 480.370. Training.
2020-06-30 20:44:59 INFO [stats.py:111] WalkerStatic: Step: 5100000. Time Elapsed: 4672.460 s Mean Reward: 1025.437. Std of Reward: 427.882. Training.
2020-06-30 20:45:27 INFO [stats.py:111] WalkerStatic: Step: 5130000. Time Elapsed: 4699.662 s Mean Reward: 978.472. Std of Reward: 471.743. Training.
2020-06-30 20:45:50 INFO [stats.py:111] WalkerStatic: Step: 5160000. Time Elapsed: 4723.103 s Mean Reward: 919.838. Std of Reward: 493.449. Training.
2020-06-30 20:46:21 INFO [stats.py:111] WalkerStatic: Step: 5190000. Time Elapsed: 4753.775 s Mean Reward: 845.363. Std of Reward: 506.586. Training.
2020-06-30 20:46:45 INFO [stats.py:111] WalkerStatic: Step: 5220000. Time Elapsed: 4777.827 s Mean Reward: 829.614. Std of Reward: 473.432. Training.
2020-06-30 20:47:14 INFO [stats.py:111] WalkerStatic: Step: 5250000. Time Elapsed: 4807.023 s Mean Reward: 762.202. Std of Reward: 515.511. Training.
2020-06-30 20:47:38 INFO [stats.py:111] WalkerStatic: Step: 5280000. Time Elapsed: 4831.036 s Mean Reward: 888.943. Std of Reward: 396.218. Training.
2020-06-30 20:48:10 INFO [stats.py:111] WalkerStatic: Step: 5310000. Time Elapsed: 4863.228 s Mean Reward: 760.388. Std of Reward: 516.672. Training.
2020-06-30 20:48:39 INFO [stats.py:111] WalkerStatic: Step: 5340000. Time Elapsed: 4892.504 s Mean Reward: 860.478. Std of Reward: 533.441. Training.
2020-06-30 20:48:57 INFO [stats.py:111] WalkerStatic: Step: 5370000. Time Elapsed: 4909.850 s Mean Reward: 856.560. Std of Reward: 506.877. Training.
2020-06-30 20:49:28 INFO [stats.py:111] WalkerStatic: Step: 5400000. Time Elapsed: 4941.246 s Mean Reward: 754.349. Std of Reward: 534.663. Training.
2020-06-30 20:49:56 INFO [stats.py:111] WalkerStatic: Step: 5430000. Time Elapsed: 4969.230 s Mean Reward: 782.830. Std of Reward: 523.242. Training.
2020-06-30 20:50:20 INFO [stats.py:111] WalkerStatic: Step: 5460000. Time Elapsed: 4993.444 s Mean Reward: 781.832. Std of Reward: 509.278. Training.
2020-06-30 20:50:42 INFO [stats.py:111] WalkerStatic: Step: 5490000. Time Elapsed: 5015.455 s Mean Reward: 794.680. Std of Reward: 455.716. Training.
2020-06-30 20:50:56 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 20:51:18 INFO [stats.py:111] WalkerStatic: Step: 5520000. Time Elapsed: 5051.231 s Mean Reward: 694.398. Std of Reward: 503.462. Training.
2020-06-30 20:51:40 INFO [stats.py:111] WalkerStatic: Step: 5550000. Time Elapsed: 5072.961 s Mean Reward: 793.103. Std of Reward: 504.894. Training.
2020-06-30 20:52:09 INFO [stats.py:111] WalkerStatic: Step: 5580000. Time Elapsed: 5102.391 s Mean Reward: 754.172. Std of Reward: 528.083. Training.
2020-06-30 20:52:41 INFO [stats.py:111] WalkerStatic: Step: 5610000. Time Elapsed: 5133.628 s Mean Reward: 859.653. Std of Reward: 514.074. Training.
2020-06-30 20:53:00 INFO [stats.py:111] WalkerStatic: Step: 5640000. Time Elapsed: 5152.662 s Mean Reward: 823.410. Std of Reward: 519.576. Training.
2020-06-30 20:53:34 INFO [stats.py:111] WalkerStatic: Step: 5670000. Time Elapsed: 5187.338 s Mean Reward: 743.176. Std of Reward: 543.429. Training.
2020-06-30 20:53:56 INFO [stats.py:111] WalkerStatic: Step: 5700000. Time Elapsed: 5208.911 s Mean Reward: 953.621. Std of Reward: 498.459. Training.
2020-06-30 20:54:23 INFO [stats.py:111] WalkerStatic: Step: 5730000. Time Elapsed: 5235.807 s Mean Reward: 885.845. Std of Reward: 517.938. Training.
2020-06-30 20:54:48 INFO [stats.py:111] WalkerStatic: Step: 5760000. Time Elapsed: 5261.559 s Mean Reward: 773.573. Std of Reward: 535.414. Training.
2020-06-30 20:55:20 INFO [stats.py:111] WalkerStatic: Step: 5790000. Time Elapsed: 5293.027 s Mean Reward: 815.579. Std of Reward: 563.446. Training.
2020-06-30 20:55:48 INFO [stats.py:111] WalkerStatic: Step: 5820000. Time Elapsed: 5321.401 s Mean Reward: 920.812. Std of Reward: 522.630. Training.
2020-06-30 20:56:12 INFO [stats.py:111] WalkerStatic: Step: 5850000. Time Elapsed: 5345.051 s Mean Reward: 971.265. Std of Reward: 490.654. Training.
2020-06-30 20:56:43 INFO [stats.py:111] WalkerStatic: Step: 5880000. Time Elapsed: 5375.845 s Mean Reward: 769.771. Std of Reward: 587.932. Training.
2020-06-30 20:57:03 INFO [stats.py:111] WalkerStatic: Step: 5910000. Time Elapsed: 5395.756 s Mean Reward: 876.737. Std of Reward: 558.941. Training.
2020-06-30 20:57:30 INFO [stats.py:111] WalkerStatic: Step: 5940000. Time Elapsed: 5422.620 s Mean Reward: 924.171. Std of Reward: 526.602. Training.
2020-06-30 20:57:59 INFO [stats.py:111] WalkerStatic: Step: 5970000. Time Elapsed: 5451.915 s Mean Reward: 913.114. Std of Reward: 519.652. Training.
2020-06-30 20:58:25 INFO [stats.py:111] WalkerStatic: Step: 6000000. Time Elapsed: 5477.643 s Mean Reward: 821.113. Std of Reward: 585.819. Training.
2020-06-30 20:58:25 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 20:58:56 INFO [stats.py:111] WalkerStatic: Step: 6030000. Time Elapsed: 5509.045 s Mean Reward: 819.122. Std of Reward: 602.541. Training.
2020-06-30 20:59:21 INFO [stats.py:111] WalkerStatic: Step: 6060000. Time Elapsed: 5534.550 s Mean Reward: 910.359. Std of Reward: 585.652. Training.
2020-06-30 20:59:51 INFO [stats.py:111] WalkerStatic: Step: 6090000. Time Elapsed: 5563.808 s Mean Reward: 920.135. Std of Reward: 578.094. Training.
2020-06-30 21:00:15 INFO [stats.py:111] WalkerStatic: Step: 6120000. Time Elapsed: 5588.388 s Mean Reward: 1140.184. Std of Reward: 390.066. Training.
2020-06-30 21:00:42 INFO [stats.py:111] WalkerStatic: Step: 6150000. Time Elapsed: 5615.136 s Mean Reward: 995.785. Std of Reward: 542.538. Training.
2020-06-30 21:01:13 INFO [stats.py:111] WalkerStatic: Step: 6180000. Time Elapsed: 5645.793 s Mean Reward: 921.010. Std of Reward: 567.587. Training.
2020-06-30 21:01:32 INFO [stats.py:111] WalkerStatic: Step: 6210000. Time Elapsed: 5665.213 s Mean Reward: 954.901. Std of Reward: 575.082. Training.
2020-06-30 21:02:04 INFO [stats.py:111] WalkerStatic: Step: 6240000. Time Elapsed: 5697.367 s Mean Reward: 1002.657. Std of Reward: 519.415. Training.
2020-06-30 21:02:32 INFO [stats.py:111] WalkerStatic: Step: 6270000. Time Elapsed: 5724.817 s Mean Reward: 790.452. Std of Reward: 598.974. Training.
2020-06-30 21:03:01 INFO [stats.py:111] WalkerStatic: Step: 6300000. Time Elapsed: 5754.094 s Mean Reward: 994.793. Std of Reward: 572.497. Training.
2020-06-30 21:03:26 INFO [stats.py:111] WalkerStatic: Step: 6330000. Time Elapsed: 5778.863 s Mean Reward: 1067.411. Std of Reward: 515.532. Training.
2020-06-30 21:03:50 INFO [stats.py:111] WalkerStatic: Step: 6360000. Time Elapsed: 5803.254 s Mean Reward: 878.450. Std of Reward: 616.766. Training.
2020-06-30 21:04:17 INFO [stats.py:111] WalkerStatic: Step: 6390000. Time Elapsed: 5829.948 s Mean Reward: 1120.522. Std of Reward: 438.030. Training.
2020-06-30 21:04:46 INFO [stats.py:111] WalkerStatic: Step: 6420000. Time Elapsed: 5859.150 s Mean Reward: 924.747. Std of Reward: 598.529. Training.
2020-06-30 21:05:15 INFO [stats.py:111] WalkerStatic: Step: 6450000. Time Elapsed: 5887.904 s Mean Reward: 1110.073. Std of Reward: 472.790. Training.
2020-06-30 21:05:37 INFO [stats.py:111] WalkerStatic: Step: 6480000. Time Elapsed: 5910.522 s Mean Reward: 950.121. Std of Reward: 611.821. Training.
2020-06-30 21:05:58 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 21:06:10 INFO [stats.py:111] WalkerStatic: Step: 6510000. Time Elapsed: 5942.866 s Mean Reward: 944.028. Std of Reward: 565.570. Training.
2020-06-30 21:06:31 INFO [stats.py:111] WalkerStatic: Step: 6540000. Time Elapsed: 5963.719 s Mean Reward: 728.876. Std of Reward: 608.874. Training.
2020-06-30 21:06:54 INFO [stats.py:111] WalkerStatic: Step: 6570000. Time Elapsed: 5986.952 s Mean Reward: 623.421. Std of Reward: 569.628. Training.
2020-06-30 21:07:25 INFO [stats.py:111] WalkerStatic: Step: 6600000. Time Elapsed: 6017.957 s Mean Reward: 637.785. Std of Reward: 539.740. Training.
2020-06-30 21:07:54 INFO [stats.py:111] WalkerStatic: Step: 6630000. Time Elapsed: 6046.930 s Mean Reward: 634.092. Std of Reward: 498.120. Training.
2020-06-30 21:08:24 INFO [stats.py:111] WalkerStatic: Step: 6660000. Time Elapsed: 6077.028 s Mean Reward: 847.329. Std of Reward: 578.983. Training.
2020-06-30 21:08:47 INFO [stats.py:111] WalkerStatic: Step: 6690000. Time Elapsed: 6100.594 s Mean Reward: 888.489. Std of Reward: 546.850. Training.
2020-06-30 21:09:17 INFO [stats.py:111] WalkerStatic: Step: 6720000. Time Elapsed: 6129.636 s Mean Reward: 802.601. Std of Reward: 572.174. Training.
2020-06-30 21:09:44 INFO [stats.py:111] WalkerStatic: Step: 6750000. Time Elapsed: 6157.157 s Mean Reward: 813.695. Std of Reward: 602.632. Training.
2020-06-30 21:10:09 INFO [stats.py:111] WalkerStatic: Step: 6780000. Time Elapsed: 6182.131 s Mean Reward: 850.982. Std of Reward: 589.685. Training.
2020-06-30 21:10:38 INFO [stats.py:111] WalkerStatic: Step: 6810000. Time Elapsed: 6211.504 s Mean Reward: 886.707. Std of Reward: 570.443. Training.
2020-06-30 21:11:07 INFO [stats.py:111] WalkerStatic: Step: 6840000. Time Elapsed: 6240.408 s Mean Reward: 926.033. Std of Reward: 547.368. Training.
2020-06-30 21:11:36 INFO [stats.py:111] WalkerStatic: Step: 6870000. Time Elapsed: 6269.156 s Mean Reward: 884.785. Std of Reward: 562.913. Training.
2020-06-30 21:12:02 INFO [stats.py:111] WalkerStatic: Step: 6900000. Time Elapsed: 6295.069 s Mean Reward: 912.979. Std of Reward: 584.309. Training.
2020-06-30 21:12:29 INFO [stats.py:111] WalkerStatic: Step: 6930000. Time Elapsed: 6322.326 s Mean Reward: 999.703. Std of Reward: 544.126. Training.
2020-06-30 21:12:55 INFO [stats.py:111] WalkerStatic: Step: 6960000. Time Elapsed: 6347.613 s Mean Reward: 843.568. Std of Reward: 607.020. Training.
2020-06-30 21:13:26 INFO [stats.py:111] WalkerStatic: Step: 6990000. Time Elapsed: 6378.819 s Mean Reward: 1104.782. Std of Reward: 507.565. Training.
2020-06-30 21:13:32 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 21:13:50 INFO [stats.py:111] WalkerStatic: Step: 7020000. Time Elapsed: 6403.360 s Mean Reward: 983.177. Std of Reward: 602.133. Training.
2020-06-30 21:14:13 INFO [stats.py:111] WalkerStatic: Step: 7050000. Time Elapsed: 6425.676 s Mean Reward: 1125.525. Std of Reward: 477.479. Training.
2020-06-30 21:14:45 INFO [stats.py:111] WalkerStatic: Step: 7080000. Time Elapsed: 6457.997 s Mean Reward: 820.526. Std of Reward: 602.924. Training.
2020-06-30 21:15:11 INFO [stats.py:111] WalkerStatic: Step: 7110000. Time Elapsed: 6484.303 s Mean Reward: 995.614. Std of Reward: 610.598. Training.
2020-06-30 21:15:40 INFO [stats.py:111] WalkerStatic: Step: 7140000. Time Elapsed: 6513.131 s Mean Reward: 1087.777. Std of Reward: 534.327. Training.
2020-06-30 21:16:06 INFO [stats.py:111] WalkerStatic: Step: 7170000. Time Elapsed: 6538.961 s Mean Reward: 867.888. Std of Reward: 654.275. Training.
2020-06-30 21:16:37 INFO [stats.py:111] WalkerStatic: Step: 7200000. Time Elapsed: 6569.764 s Mean Reward: 905.776. Std of Reward: 615.709. Training.
2020-06-30 21:16:58 INFO [stats.py:111] WalkerStatic: Step: 7230000. Time Elapsed: 6591.403 s Mean Reward: 935.079. Std of Reward: 652.690. Training.
2020-06-30 21:17:30 INFO [stats.py:111] WalkerStatic: Step: 7260000. Time Elapsed: 6623.179 s Mean Reward: 894.216. Std of Reward: 625.267. Training.
2020-06-30 21:17:57 INFO [stats.py:111] WalkerStatic: Step: 7290000. Time Elapsed: 6649.659 s Mean Reward: 1104.088. Std of Reward: 497.518. Training.
2020-06-30 21:18:19 INFO [stats.py:111] WalkerStatic: Step: 7320000. Time Elapsed: 6672.428 s Mean Reward: 994.810. Std of Reward: 600.243. Training.
2020-06-30 21:18:52 INFO [stats.py:111] WalkerStatic: Step: 7350000. Time Elapsed: 6704.808 s Mean Reward: 931.011. Std of Reward: 612.057. Training.
2020-06-30 21:19:13 INFO [stats.py:111] WalkerStatic: Step: 7380000. Time Elapsed: 6726.317 s Mean Reward: 933.729. Std of Reward: 594.078. Training.
2020-06-30 21:19:43 INFO [stats.py:111] WalkerStatic: Step: 7410000. Time Elapsed: 6756.058 s Mean Reward: 1062.214. Std of Reward: 544.540. Training.
2020-06-30 21:20:12 INFO [stats.py:111] WalkerStatic: Step: 7440000. Time Elapsed: 6784.923 s Mean Reward: 1060.665. Std of Reward: 558.984. Training.
2020-06-30 21:20:37 INFO [stats.py:111] WalkerStatic: Step: 7470000. Time Elapsed: 6809.731 s Mean Reward: 1128.730. Std of Reward: 510.811. Training.
2020-06-30 21:21:10 INFO [stats.py:111] WalkerStatic: Step: 7500000. Time Elapsed: 6843.416 s Mean Reward: 1017.245. Std of Reward: 583.079. Training.
2020-06-30 21:21:10 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 21:21:35 INFO [stats.py:111] WalkerStatic: Step: 7530000. Time Elapsed: 6868.461 s Mean Reward: 935.606. Std of Reward: 627.605. Training.
2020-06-30 21:22:03 INFO [stats.py:111] WalkerStatic: Step: 7560000. Time Elapsed: 6895.698 s Mean Reward: 849.436. Std of Reward: 642.292. Training.
2020-06-30 21:22:29 INFO [stats.py:111] WalkerStatic: Step: 7590000. Time Elapsed: 6922.365 s Mean Reward: 916.010. Std of Reward: 620.376. Training.
2020-06-30 21:22:56 INFO [stats.py:111] WalkerStatic: Step: 7620000. Time Elapsed: 6949.437 s Mean Reward: 821.924. Std of Reward: 656.160. Training.
2020-06-30 21:23:26 INFO [stats.py:111] WalkerStatic: Step: 7650000. Time Elapsed: 6978.917 s Mean Reward: 1238.552. Std of Reward: 411.586. Training.
2020-06-30 21:23:55 INFO [stats.py:111] WalkerStatic: Step: 7680000. Time Elapsed: 7008.488 s Mean Reward: 1134.569. Std of Reward: 557.916. Training.
2020-06-30 21:24:16 INFO [stats.py:111] WalkerStatic: Step: 7710000. Time Elapsed: 7029.216 s Mean Reward: 1047.074. Std of Reward: 624.268. Training.
2020-06-30 21:24:44 INFO [stats.py:111] WalkerStatic: Step: 7740000. Time Elapsed: 7057.199 s Mean Reward: 988.247. Std of Reward: 615.464. Training.
2020-06-30 21:25:15 INFO [stats.py:111] WalkerStatic: Step: 7770000. Time Elapsed: 7087.727 s Mean Reward: 865.824. Std of Reward: 645.312. Training.
2020-06-30 21:25:36 INFO [stats.py:111] WalkerStatic: Step: 7800000. Time Elapsed: 7109.319 s Mean Reward: 1200.431. Std of Reward: 512.863. Training.
2020-06-30 21:26:06 INFO [stats.py:111] WalkerStatic: Step: 7830000. Time Elapsed: 7139.524 s Mean Reward: 1102.218. Std of Reward: 577.406. Training.
2020-06-30 21:26:35 INFO [stats.py:111] WalkerStatic: Step: 7860000. Time Elapsed: 7168.057 s Mean Reward: 981.908. Std of Reward: 622.082. Training.
2020-06-30 21:27:00 INFO [stats.py:111] WalkerStatic: Step: 7890000. Time Elapsed: 7192.929 s Mean Reward: 1037.648. Std of Reward: 612.406. Training.
2020-06-30 21:27:30 INFO [stats.py:111] WalkerStatic: Step: 7920000. Time Elapsed: 7222.656 s Mean Reward: 1329.782. Std of Reward: 368.479. Training.
2020-06-30 21:27:56 INFO [stats.py:111] WalkerStatic: Step: 7950000. Time Elapsed: 7249.454 s Mean Reward: 1102.971. Std of Reward: 601.367. Training.
2020-06-30 21:28:23 INFO [stats.py:111] WalkerStatic: Step: 7980000. Time Elapsed: 7276.360 s Mean Reward: 1106.121. Std of Reward: 572.519. Training.
2020-06-30 21:28:44 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 21:28:52 INFO [stats.py:111] WalkerStatic: Step: 8010000. Time Elapsed: 7305.469 s Mean Reward: 1050.371. Std of Reward: 624.204. Training.
2020-06-30 21:29:16 INFO [stats.py:111] WalkerStatic: Step: 8040000. Time Elapsed: 7328.709 s Mean Reward: 1058.377. Std of Reward: 589.351. Training.
2020-06-30 21:29:43 INFO [stats.py:111] WalkerStatic: Step: 8070000. Time Elapsed: 7356.151 s Mean Reward: 1020.440. Std of Reward: 628.843. Training.
2020-06-30 21:30:12 INFO [stats.py:111] WalkerStatic: Step: 8100000. Time Elapsed: 7384.938 s Mean Reward: 1072.168. Std of Reward: 591.283. Training.
2020-06-30 21:30:40 INFO [stats.py:111] WalkerStatic: Step: 8130000. Time Elapsed: 7413.362 s Mean Reward: 1080.431. Std of Reward: 619.632. Training.
2020-06-30 21:31:09 INFO [stats.py:111] WalkerStatic: Step: 8160000. Time Elapsed: 7441.667 s Mean Reward: 1054.807. Std of Reward: 603.345. Training.
2020-06-30 21:31:33 INFO [stats.py:111] WalkerStatic: Step: 8190000. Time Elapsed: 7466.141 s Mean Reward: 1168.881. Std of Reward: 573.655. Training.
2020-06-30 21:32:01 INFO [stats.py:111] WalkerStatic: Step: 8220000. Time Elapsed: 7494.492 s Mean Reward: 1097.289. Std of Reward: 604.471. Training.
2020-06-30 21:32:29 INFO [stats.py:111] WalkerStatic: Step: 8250000. Time Elapsed: 7522.326 s Mean Reward: 1127.763. Std of Reward: 614.503. Training.
2020-06-30 21:32:56 INFO [stats.py:111] WalkerStatic: Step: 8280000. Time Elapsed: 7549.545 s Mean Reward: 905.999. Std of Reward: 677.737. Training.
2020-06-30 21:33:22 INFO [stats.py:111] WalkerStatic: Step: 8310000. Time Elapsed: 7574.618 s Mean Reward: 985.769. Std of Reward: 651.984. Training.
2020-06-30 21:33:54 INFO [stats.py:111] WalkerStatic: Step: 8340000. Time Elapsed: 7607.541 s Mean Reward: 954.430. Std of Reward: 670.575. Training.
2020-06-30 21:34:15 INFO [stats.py:111] WalkerStatic: Step: 8370000. Time Elapsed: 7628.148 s Mean Reward: 1161.428. Std of Reward: 587.561. Training.
2020-06-30 21:34:48 INFO [stats.py:111] WalkerStatic: Step: 8400000. Time Elapsed: 7661.414 s Mean Reward: 1176.025. Std of Reward: 559.002. Training.
2020-06-30 21:35:14 INFO [stats.py:111] WalkerStatic: Step: 8430000. Time Elapsed: 7686.791 s Mean Reward: 1240.448. Std of Reward: 497.823. Training.
2020-06-30 21:35:38 INFO [stats.py:111] WalkerStatic: Step: 8460000. Time Elapsed: 7710.638 s Mean Reward: 1105.568. Std of Reward: 626.568. Training.
2020-06-30 21:36:09 INFO [stats.py:111] WalkerStatic: Step: 8490000. Time Elapsed: 7742.194 s Mean Reward: 1196.723. Std of Reward: 532.497. Training.
2020-06-30 21:36:18 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 21:36:34 INFO [stats.py:111] WalkerStatic: Step: 8520000. Time Elapsed: 7766.711 s Mean Reward: 1026.382. Std of Reward: 647.525. Training.
2020-06-30 21:37:05 INFO [stats.py:111] WalkerStatic: Step: 8550000. Time Elapsed: 7798.225 s Mean Reward: 1139.915. Std of Reward: 619.155. Training.
2020-06-30 21:37:32 INFO [stats.py:111] WalkerStatic: Step: 8580000. Time Elapsed: 7824.942 s Mean Reward: 1119.692. Std of Reward: 623.215. Training.
2020-06-30 21:37:59 INFO [stats.py:111] WalkerStatic: Step: 8610000. Time Elapsed: 7852.416 s Mean Reward: 1138.586. Std of Reward: 590.625. Training.
2020-06-30 21:38:23 INFO [stats.py:111] WalkerStatic: Step: 8640000. Time Elapsed: 7876.214 s Mean Reward: 1060.899. Std of Reward: 617.781. Training.
2020-06-30 21:38:54 INFO [stats.py:111] WalkerStatic: Step: 8670000. Time Elapsed: 7907.180 s Mean Reward: 1089.002. Std of Reward: 590.697. Training.
2020-06-30 21:39:20 INFO [stats.py:111] WalkerStatic: Step: 8700000. Time Elapsed: 7932.631 s Mean Reward: 1118.599. Std of Reward: 638.288. Training.
2020-06-30 21:39:50 INFO [stats.py:111] WalkerStatic: Step: 8730000. Time Elapsed: 7963.154 s Mean Reward: 1097.612. Std of Reward: 641.831. Training.
2020-06-30 21:40:19 INFO [stats.py:111] WalkerStatic: Step: 8760000. Time Elapsed: 7991.840 s Mean Reward: 1217.995. Std of Reward: 598.542. Training.
2020-06-30 21:40:41 INFO [stats.py:111] WalkerStatic: Step: 8790000. Time Elapsed: 8013.913 s Mean Reward: 1086.657. Std of Reward: 652.167. Training.
2020-06-30 21:41:10 INFO [stats.py:111] WalkerStatic: Step: 8820000. Time Elapsed: 8043.313 s Mean Reward: 986.149. Std of Reward: 675.947. Training.
2020-06-30 21:41:38 INFO [stats.py:111] WalkerStatic: Step: 8850000. Time Elapsed: 8070.898 s Mean Reward: 1082.765. Std of Reward: 631.365. Training.
2020-06-30 21:42:08 INFO [stats.py:111] WalkerStatic: Step: 8880000. Time Elapsed: 8101.334 s Mean Reward: 1090.673. Std of Reward: 640.988. Training.
2020-06-30 21:42:31 INFO [stats.py:111] WalkerStatic: Step: 8910000. Time Elapsed: 8123.714 s Mean Reward: 1221.507. Std of Reward: 548.462. Training.
2020-06-30 21:43:05 INFO [stats.py:111] WalkerStatic: Step: 8940000. Time Elapsed: 8158.500 s Mean Reward: 1233.089. Std of Reward: 541.084. Training.
2020-06-30 21:43:32 INFO [stats.py:111] WalkerStatic: Step: 8970000. Time Elapsed: 8184.924 s Mean Reward: 1372.001. Std of Reward: 439.842. Training.
2020-06-30 21:43:58 INFO [stats.py:111] WalkerStatic: Step: 9000000. Time Elapsed: 8210.862 s Mean Reward: 1142.388. Std of Reward: 644.929. Training.
2020-06-30 21:43:58 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 21:44:28 INFO [stats.py:111] WalkerStatic: Step: 9030000. Time Elapsed: 8240.754 s Mean Reward: 1070.747. Std of Reward: 666.221. Training.
2020-06-30 21:44:52 INFO [stats.py:111] WalkerStatic: Step: 9060000. Time Elapsed: 8264.851 s Mean Reward: 1097.109. Std of Reward: 653.885. Training.
2020-06-30 21:45:17 INFO [stats.py:111] WalkerStatic: Step: 9090000. Time Elapsed: 8289.747 s Mean Reward: 1012.783. Std of Reward: 687.917. Training.
2020-06-30 21:45:52 INFO [stats.py:111] WalkerStatic: Step: 9120000. Time Elapsed: 8325.089 s Mean Reward: 1292.299. Std of Reward: 488.190. Training.
2020-06-30 21:46:16 INFO [stats.py:111] WalkerStatic: Step: 9150000. Time Elapsed: 8349.542 s Mean Reward: 1106.709. Std of Reward: 670.416. Training.
2020-06-30 21:46:45 INFO [stats.py:111] WalkerStatic: Step: 9180000. Time Elapsed: 8377.641 s Mean Reward: 1021.604. Std of Reward: 706.262. Training.
2020-06-30 21:47:13 INFO [stats.py:111] WalkerStatic: Step: 9210000. Time Elapsed: 8406.559 s Mean Reward: 1097.209. Std of Reward: 647.446. Training.
2020-06-30 21:47:41 INFO [stats.py:111] WalkerStatic: Step: 9240000. Time Elapsed: 8434.466 s Mean Reward: 1134.684. Std of Reward: 639.684. Training.
2020-06-30 21:48:06 INFO [stats.py:111] WalkerStatic: Step: 9270000. Time Elapsed: 8459.156 s Mean Reward: 1322.188. Std of Reward: 503.533. Training.
2020-06-30 21:48:37 INFO [stats.py:111] WalkerStatic: Step: 9300000. Time Elapsed: 8489.725 s Mean Reward: 1163.623. Std of Reward: 610.518. Training.
2020-06-30 21:49:01 INFO [stats.py:111] WalkerStatic: Step: 9330000. Time Elapsed: 8514.420 s Mean Reward: 1382.915. Std of Reward: 503.681. Training.
2020-06-30 21:49:28 INFO [stats.py:111] WalkerStatic: Step: 9360000. Time Elapsed: 8541.389 s Mean Reward: 1284.084. Std of Reward: 595.225. Training.
2020-06-30 21:49:58 INFO [stats.py:111] WalkerStatic: Step: 9390000. Time Elapsed: 8570.982 s Mean Reward: 1005.383. Std of Reward: 716.809. Training.
2020-06-30 21:50:25 INFO [stats.py:111] WalkerStatic: Step: 9420000. Time Elapsed: 8598.108 s Mean Reward: 971.835. Std of Reward: 735.050. Training.
2020-06-30 21:50:54 INFO [stats.py:111] WalkerStatic: Step: 9450000. Time Elapsed: 8626.734 s Mean Reward: 1149.526. Std of Reward: 658.153. Training.
2020-06-30 21:51:19 INFO [stats.py:111] WalkerStatic: Step: 9480000. Time Elapsed: 8652.494 s Mean Reward: 1323.949. Std of Reward: 484.389. Training.
2020-06-30 21:51:39 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 21:51:46 INFO [stats.py:111] WalkerStatic: Step: 9510000. Time Elapsed: 8679.264 s Mean Reward: 1148.520. Std of Reward: 651.046. Training.
2020-06-30 21:52:17 INFO [stats.py:111] WalkerStatic: Step: 9540000. Time Elapsed: 8710.449 s Mean Reward: 1239.330. Std of Reward: 554.340. Training.
2020-06-30 21:52:41 INFO [stats.py:111] WalkerStatic: Step: 9570000. Time Elapsed: 8734.019 s Mean Reward: 1057.537. Std of Reward: 682.750. Training.
2020-06-30 21:53:15 INFO [stats.py:111] WalkerStatic: Step: 9600000. Time Elapsed: 8768.249 s Mean Reward: 1105.429. Std of Reward: 665.646. Training.
2020-06-30 21:53:37 INFO [stats.py:111] WalkerStatic: Step: 9630000. Time Elapsed: 8790.344 s Mean Reward: 1222.821. Std of Reward: 628.590. Training.
2020-06-30 21:54:10 INFO [stats.py:111] WalkerStatic: Step: 9660000. Time Elapsed: 8823.430 s Mean Reward: 1057.998. Std of Reward: 660.291. Training.
2020-06-30 21:54:37 INFO [stats.py:111] WalkerStatic: Step: 9690000. Time Elapsed: 8849.618 s Mean Reward: 1143.565. Std of Reward: 630.023. Training.
2020-06-30 21:54:59 INFO [stats.py:111] WalkerStatic: Step: 9720000. Time Elapsed: 8872.057 s Mean Reward: 1127.939. Std of Reward: 646.437. Training.
2020-06-30 21:55:30 INFO [stats.py:111] WalkerStatic: Step: 9750000. Time Elapsed: 8902.724 s Mean Reward: 1128.958. Std of Reward: 649.822. Training.
2020-06-30 21:55:57 INFO [stats.py:111] WalkerStatic: Step: 9780000. Time Elapsed: 8930.510 s Mean Reward: 1183.793. Std of Reward: 631.638. Training.
2020-06-30 21:56:24 INFO [stats.py:111] WalkerStatic: Step: 9810000. Time Elapsed: 8957.427 s Mean Reward: 1150.850. Std of Reward: 652.685. Training.
2020-06-30 21:56:52 INFO [stats.py:111] WalkerStatic: Step: 9840000. Time Elapsed: 8984.891 s Mean Reward: 1076.389. Std of Reward: 689.238. Training.
2020-06-30 21:57:22 INFO [stats.py:111] WalkerStatic: Step: 9870000. Time Elapsed: 9015.215 s Mean Reward: 1000.898. Std of Reward: 663.810. Training.
2020-06-30 21:57:49 INFO [stats.py:111] WalkerStatic: Step: 9900000. Time Elapsed: 9042.303 s Mean Reward: 1205.663. Std of Reward: 601.752. Training.
2020-06-30 21:58:14 INFO [stats.py:111] WalkerStatic: Step: 9930000. Time Elapsed: 9067.115 s Mean Reward: 1249.738. Std of Reward: 576.962. Training.
2020-06-30 21:58:43 INFO [stats.py:111] WalkerStatic: Step: 9960000. Time Elapsed: 9095.712 s Mean Reward: 1154.428. Std of Reward: 654.826. Training.
2020-06-30 21:59:09 INFO [stats.py:111] WalkerStatic: Step: 9990000. Time Elapsed: 9122.608 s Mean Reward: 1242.038. Std of Reward: 579.086. Training.
2020-06-30 21:59:23 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 21:59:43 INFO [stats.py:111] WalkerStatic: Step: 10020000. Time Elapsed: 9156.264 s Mean Reward: 1171.305. Std of Reward: 674.328. Training.
2020-06-30 22:00:04 INFO [stats.py:111] WalkerStatic: Step: 10050000. Time Elapsed: 9176.853 s Mean Reward: 1349.881. Std of Reward: 559.706. Training.
2020-06-30 22:00:37 INFO [stats.py:111] WalkerStatic: Step: 10080000. Time Elapsed: 9209.999 s Mean Reward: 1030.916. Std of Reward: 735.853. Training.
2020-06-30 22:01:03 INFO [stats.py:111] WalkerStatic: Step: 10110000. Time Elapsed: 9235.668 s Mean Reward: 1180.584. Std of Reward: 623.497. Training.
2020-06-30 22:01:28 INFO [stats.py:111] WalkerStatic: Step: 10140000. Time Elapsed: 9260.815 s Mean Reward: 1312.486. Std of Reward: 595.197. Training.
2020-06-30 22:01:56 INFO [stats.py:111] WalkerStatic: Step: 10170000. Time Elapsed: 9288.645 s Mean Reward: 1137.880. Std of Reward: 636.873. Training.
2020-06-30 22:02:28 INFO [stats.py:111] WalkerStatic: Step: 10200000. Time Elapsed: 9321.050 s Mean Reward: 1129.309. Std of Reward: 681.620. Training.
2020-06-30 22:02:55 INFO [stats.py:111] WalkerStatic: Step: 10230000. Time Elapsed: 9348.551 s Mean Reward: 1318.006. Std of Reward: 571.861. Training.
2020-06-30 22:03:21 INFO [stats.py:111] WalkerStatic: Step: 10260000. Time Elapsed: 9374.198 s Mean Reward: 1320.640. Std of Reward: 556.957. Training.
2020-06-30 22:03:57 INFO [stats.py:111] WalkerStatic: Step: 10290000. Time Elapsed: 9410.402 s Mean Reward: 1288.196. Std of Reward: 601.388. Training.
2020-06-30 22:04:16 INFO [stats.py:111] WalkerStatic: Step: 10320000. Time Elapsed: 9429.245 s Mean Reward: 1290.955. Std of Reward: 617.154. Training.
2020-06-30 22:04:48 INFO [stats.py:111] WalkerStatic: Step: 10350000. Time Elapsed: 9460.831 s Mean Reward: 1104.580. Std of Reward: 701.250. Training.
2020-06-30 22:05:18 INFO [stats.py:111] WalkerStatic: Step: 10380000. Time Elapsed: 9490.839 s Mean Reward: 1177.991. Std of Reward: 648.758. Training.
2020-06-30 22:05:39 INFO [stats.py:111] WalkerStatic: Step: 10410000. Time Elapsed: 9512.361 s Mean Reward: 1210.766. Std of Reward: 657.343. Training.
2020-06-30 22:06:12 INFO [stats.py:111] WalkerStatic: Step: 10440000. Time Elapsed: 9544.809 s Mean Reward: 1140.072. Std of Reward: 667.789. Training.
2020-06-30 22:06:37 INFO [stats.py:111] WalkerStatic: Step: 10470000. Time Elapsed: 9569.716 s Mean Reward: 1128.914. Std of Reward: 679.007. Training.
2020-06-30 22:07:05 INFO [stats.py:111] WalkerStatic: Step: 10500000. Time Elapsed: 9598.385 s Mean Reward: 1171.259. Std of Reward: 664.824. Training.
2020-06-30 22:07:05 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 22:07:35 INFO [stats.py:111] WalkerStatic: Step: 10530000. Time Elapsed: 9628.504 s Mean Reward: 1003.826. Std of Reward: 729.168. Training.
2020-06-30 22:08:00 INFO [stats.py:111] WalkerStatic: Step: 10560000. Time Elapsed: 9653.313 s Mean Reward: 1344.139. Std of Reward: 569.249. Training.
2020-06-30 22:08:24 INFO [stats.py:111] WalkerStatic: Step: 10590000. Time Elapsed: 9676.677 s Mean Reward: 1222.825. Std of Reward: 634.372. Training.
2020-06-30 22:08:56 INFO [stats.py:111] WalkerStatic: Step: 10620000. Time Elapsed: 9708.746 s Mean Reward: 1082.037. Std of Reward: 717.809. Training.
2020-06-30 22:09:24 INFO [stats.py:111] WalkerStatic: Step: 10650000. Time Elapsed: 9736.705 s Mean Reward: 1401.389. Std of Reward: 523.064. Training.
2020-06-30 22:09:51 INFO [stats.py:111] WalkerStatic: Step: 10680000. Time Elapsed: 9763.664 s Mean Reward: 1433.927. Std of Reward: 441.604. Training.
2020-06-30 22:10:19 INFO [stats.py:111] WalkerStatic: Step: 10710000. Time Elapsed: 9792.218 s Mean Reward: 1169.476. Std of Reward: 666.483. Training.
2020-06-30 22:10:49 INFO [stats.py:111] WalkerStatic: Step: 10740000. Time Elapsed: 9822.337 s Mean Reward: 1403.310. Std of Reward: 541.374. Training.
2020-06-30 22:11:10 INFO [stats.py:111] WalkerStatic: Step: 10770000. Time Elapsed: 9843.530 s Mean Reward: 1299.283. Std of Reward: 638.997. Training.
2020-06-30 22:11:25 INFO [stats.py:111] WalkerStatic: Step: 10800000. Time Elapsed: 9858.135 s Mean Reward: 745.449. Std of Reward: 559.242. Training.
2020-06-30 22:11:48 INFO [stats.py:111] WalkerStatic: Step: 10830000. Time Elapsed: 9881.305 s Mean Reward: 286.324. Std of Reward: 363.646. Training.
2020-06-30 22:12:19 INFO [stats.py:111] WalkerStatic: Step: 10860000. Time Elapsed: 9911.897 s Mean Reward: 255.216. Std of Reward: 254.281. Training.
2020-06-30 22:12:52 INFO [stats.py:111] WalkerStatic: Step: 10890000. Time Elapsed: 9945.441 s Mean Reward: 305.738. Std of Reward: 292.048. Training.
2020-06-30 22:13:23 INFO [stats.py:111] WalkerStatic: Step: 10920000. Time Elapsed: 9976.224 s Mean Reward: 444.573. Std of Reward: 439.194. Training.
2020-06-30 22:13:51 INFO [stats.py:111] WalkerStatic: Step: 10950000. Time Elapsed: 10004.014 s Mean Reward: 524.074. Std of Reward: 443.389. Training.
2020-06-30 22:14:23 INFO [stats.py:111] WalkerStatic: Step: 10980000. Time Elapsed: 10035.615 s Mean Reward: 555.027. Std of Reward: 524.054. Training.
2020-06-30 22:14:43 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 22:14:51 INFO [stats.py:111] WalkerStatic: Step: 11010000. Time Elapsed: 10064.330 s Mean Reward: 724.688. Std of Reward: 528.555. Training.
2020-06-30 22:15:19 INFO [stats.py:111] WalkerStatic: Step: 11040000. Time Elapsed: 10091.726 s Mean Reward: 768.644. Std of Reward: 541.227. Training.
2020-06-30 22:15:48 INFO [stats.py:111] WalkerStatic: Step: 11070000. Time Elapsed: 10121.569 s Mean Reward: 847.493. Std of Reward: 616.837. Training.
2020-06-30 22:16:16 INFO [stats.py:111] WalkerStatic: Step: 11100000. Time Elapsed: 10149.143 s Mean Reward: 817.890. Std of Reward: 650.400. Training.
2020-06-30 22:16:46 INFO [stats.py:111] WalkerStatic: Step: 11130000. Time Elapsed: 10178.867 s Mean Reward: 1046.910. Std of Reward: 608.761. Training.
2020-06-30 22:17:14 INFO [stats.py:111] WalkerStatic: Step: 11160000. Time Elapsed: 10207.576 s Mean Reward: 1018.397. Std of Reward: 653.054. Training.
2020-06-30 22:17:48 INFO [stats.py:111] WalkerStatic: Step: 11190000. Time Elapsed: 10240.834 s Mean Reward: 1120.295. Std of Reward: 629.956. Training.
2020-06-30 22:18:11 INFO [stats.py:111] WalkerStatic: Step: 11220000. Time Elapsed: 10264.163 s Mean Reward: 1265.188. Std of Reward: 583.617. Training.
2020-06-30 22:18:44 INFO [stats.py:111] WalkerStatic: Step: 11250000. Time Elapsed: 10296.758 s Mean Reward: 1182.010. Std of Reward: 628.993. Training.
2020-06-30 22:19:12 INFO [stats.py:111] WalkerStatic: Step: 11280000. Time Elapsed: 10324.648 s Mean Reward: 1204.276. Std of Reward: 600.449. Training.
2020-06-30 22:19:33 INFO [stats.py:111] WalkerStatic: Step: 11310000. Time Elapsed: 10346.090 s Mean Reward: 1131.816. Std of Reward: 604.899. Training.
2020-06-30 22:20:07 INFO [stats.py:111] WalkerStatic: Step: 11340000. Time Elapsed: 10380.309 s Mean Reward: 1267.740. Std of Reward: 591.183. Training.
2020-06-30 22:20:33 INFO [stats.py:111] WalkerStatic: Step: 11370000. Time Elapsed: 10406.004 s Mean Reward: 1367.546. Std of Reward: 503.576. Training.
2020-06-30 22:21:00 INFO [stats.py:111] WalkerStatic: Step: 11400000. Time Elapsed: 10433.211 s Mean Reward: 1184.914. Std of Reward: 672.984. Training.
2020-06-30 22:21:28 INFO [stats.py:111] WalkerStatic: Step: 11430000. Time Elapsed: 10461.159 s Mean Reward: 1205.037. Std of Reward: 650.904. Training.
2020-06-30 22:21:55 INFO [stats.py:111] WalkerStatic: Step: 11460000. Time Elapsed: 10487.961 s Mean Reward: 1203.021. Std of Reward: 615.377. Training.
2020-06-30 22:22:24 INFO [stats.py:111] WalkerStatic: Step: 11490000. Time Elapsed: 10516.933 s Mean Reward: 1413.694. Std of Reward: 468.620. Training.
2020-06-30 22:22:32 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 22:22:50 INFO [stats.py:111] WalkerStatic: Step: 11520000. Time Elapsed: 10542.922 s Mean Reward: 1031.773. Std of Reward: 723.390. Training.
2020-06-30 22:23:21 INFO [stats.py:111] WalkerStatic: Step: 11550000. Time Elapsed: 10574.460 s Mean Reward: 1404.145. Std of Reward: 532.441. Training.
2020-06-30 22:23:49 INFO [stats.py:111] WalkerStatic: Step: 11580000. Time Elapsed: 10602.014 s Mean Reward: 1275.685. Std of Reward: 618.707. Training.
2020-06-30 22:24:15 INFO [stats.py:111] WalkerStatic: Step: 11610000. Time Elapsed: 10628.131 s Mean Reward: 1307.668. Std of Reward: 631.628. Training.
2020-06-30 22:24:46 INFO [stats.py:111] WalkerStatic: Step: 11640000. Time Elapsed: 10658.935 s Mean Reward: 1403.436. Std of Reward: 480.662. Training.
2020-06-30 22:25:15 INFO [stats.py:111] WalkerStatic: Step: 11670000. Time Elapsed: 10688.282 s Mean Reward: 1175.781. Std of Reward: 671.617. Training.
2020-06-30 22:25:33 INFO [stats.py:111] WalkerStatic: Step: 11700000. Time Elapsed: 10706.315 s Mean Reward: 1385.721. Std of Reward: 550.709. Training.
2020-06-30 22:26:07 INFO [stats.py:111] WalkerStatic: Step: 11730000. Time Elapsed: 10740.309 s Mean Reward: 1212.913. Std of Reward: 616.852. Training.
2020-06-30 22:26:32 INFO [stats.py:111] WalkerStatic: Step: 11760000. Time Elapsed: 10765.422 s Mean Reward: 1309.115. Std of Reward: 605.996. Training.
2020-06-30 22:27:02 INFO [stats.py:111] WalkerStatic: Step: 11790000. Time Elapsed: 10795.355 s Mean Reward: 1212.676. Std of Reward: 676.821. Training.
2020-06-30 22:27:31 INFO [stats.py:111] WalkerStatic: Step: 11820000. Time Elapsed: 10823.972 s Mean Reward: 1183.765. Std of Reward: 715.772. Training.
2020-06-30 22:27:57 INFO [stats.py:111] WalkerStatic: Step: 11850000. Time Elapsed: 10850.024 s Mean Reward: 1332.388. Std of Reward: 599.170. Training.
2020-06-30 22:28:30 INFO [stats.py:111] WalkerStatic: Step: 11880000. Time Elapsed: 10882.755 s Mean Reward: 1350.344. Std of Reward: 568.326. Training.
2020-06-30 22:28:54 INFO [stats.py:111] WalkerStatic: Step: 11910000. Time Elapsed: 10907.304 s Mean Reward: 1257.036. Std of Reward: 629.801. Training.
2020-06-30 22:29:22 INFO [stats.py:111] WalkerStatic: Step: 11940000. Time Elapsed: 10935.303 s Mean Reward: 1326.643. Std of Reward: 592.136. Training.
2020-06-30 22:29:50 INFO [stats.py:111] WalkerStatic: Step: 11970000. Time Elapsed: 10962.711 s Mean Reward: 1141.024. Std of Reward: 692.002. Training.
2020-06-30 22:30:22 INFO [stats.py:111] WalkerStatic: Step: 12000000. Time Elapsed: 10995.435 s Mean Reward: 1333.692. Std of Reward: 622.435. Training.
2020-06-30 22:30:22 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 22:30:49 INFO [stats.py:111] WalkerStatic: Step: 12030000. Time Elapsed: 11021.971 s Mean Reward: 1200.916. Std of Reward: 699.175. Training.
2020-06-30 22:31:12 INFO [stats.py:111] WalkerStatic: Step: 12060000. Time Elapsed: 11045.141 s Mean Reward: 1248.229. Std of Reward: 674.932. Training.
2020-06-30 22:31:44 INFO [stats.py:111] WalkerStatic: Step: 12090000. Time Elapsed: 11076.737 s Mean Reward: 1191.460. Std of Reward: 702.562. Training.
2020-06-30 22:32:10 INFO [stats.py:111] WalkerStatic: Step: 12120000. Time Elapsed: 11103.414 s Mean Reward: 1417.067. Std of Reward: 533.159. Training.
2020-06-30 22:32:38 INFO [stats.py:111] WalkerStatic: Step: 12150000. Time Elapsed: 11130.809 s Mean Reward: 1460.154. Std of Reward: 536.547. Training.
2020-06-30 22:33:10 INFO [stats.py:111] WalkerStatic: Step: 12180000. Time Elapsed: 11163.333 s Mean Reward: 1395.739. Std of Reward: 551.556. Training.
2020-06-30 22:33:32 INFO [stats.py:111] WalkerStatic: Step: 12210000. Time Elapsed: 11185.246 s Mean Reward: 1327.262. Std of Reward: 666.464. Training.
2020-06-30 22:34:05 INFO [stats.py:111] WalkerStatic: Step: 12240000. Time Elapsed: 11218.409 s Mean Reward: 1474.079. Std of Reward: 537.271. Training.
2020-06-30 22:34:29 INFO [stats.py:111] WalkerStatic: Step: 12270000. Time Elapsed: 11241.912 s Mean Reward: 1227.069. Std of Reward: 691.583. Training.
2020-06-30 22:34:57 INFO [stats.py:111] WalkerStatic: Step: 12300000. Time Elapsed: 11269.661 s Mean Reward: 1343.907. Std of Reward: 564.073. Training.
2020-06-30 22:35:28 INFO [stats.py:111] WalkerStatic: Step: 12330000. Time Elapsed: 11301.520 s Mean Reward: 1358.789. Std of Reward: 610.843. Training.
2020-06-30 22:35:51 INFO [stats.py:111] WalkerStatic: Step: 12360000. Time Elapsed: 11324.435 s Mean Reward: 1456.166. Std of Reward: 506.792. Training.
2020-06-30 22:36:25 INFO [stats.py:111] WalkerStatic: Step: 12390000. Time Elapsed: 11358.297 s Mean Reward: 1225.108. Std of Reward: 696.915. Training.
2020-06-30 22:36:52 INFO [stats.py:111] WalkerStatic: Step: 12420000. Time Elapsed: 11385.310 s Mean Reward: 1399.029. Std of Reward: 607.571. Training.
2020-06-30 22:37:13 INFO [stats.py:111] WalkerStatic: Step: 12450000. Time Elapsed: 11406.259 s Mean Reward: 1333.080. Std of Reward: 648.340. Training.
2020-06-30 22:37:46 INFO [stats.py:111] WalkerStatic: Step: 12480000. Time Elapsed: 11439.552 s Mean Reward: 1294.832. Std of Reward: 677.133. Training.
2020-06-30 22:38:09 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 22:38:13 INFO [stats.py:111] WalkerStatic: Step: 12510000. Time Elapsed: 11466.491 s Mean Reward: 1397.599. Std of Reward: 574.462. Training.
2020-06-30 22:38:40 INFO [stats.py:111] WalkerStatic: Step: 12540000. Time Elapsed: 11493.547 s Mean Reward: 1350.306. Std of Reward: 626.533. Training.
2020-06-30 22:39:11 INFO [stats.py:111] WalkerStatic: Step: 12570000. Time Elapsed: 11523.708 s Mean Reward: 1314.539. Std of Reward: 671.073. Training.
2020-06-30 22:39:38 INFO [stats.py:111] WalkerStatic: Step: 12600000. Time Elapsed: 11550.619 s Mean Reward: 1557.429. Std of Reward: 411.042. Training.
2020-06-30 22:40:03 INFO [stats.py:111] WalkerStatic: Step: 12630000. Time Elapsed: 11576.367 s Mean Reward: 1229.415. Std of Reward: 691.801. Training.
2020-06-30 22:40:39 INFO [stats.py:111] WalkerStatic: Step: 12660000. Time Elapsed: 11611.985 s Mean Reward: 1417.704. Std of Reward: 596.410. Training.
2020-06-30 22:41:02 INFO [stats.py:111] WalkerStatic: Step: 12690000. Time Elapsed: 11635.470 s Mean Reward: 1415.228. Std of Reward: 634.091. Training.
2020-06-30 22:41:24 INFO [stats.py:111] WalkerStatic: Step: 12720000. Time Elapsed: 11656.853 s Mean Reward: 1409.661. Std of Reward: 626.341. Training.
2020-06-30 22:42:00 INFO [stats.py:111] WalkerStatic: Step: 12750000. Time Elapsed: 11692.854 s Mean Reward: 1295.562. Std of Reward: 669.245. Training.
2020-06-30 22:42:21 INFO [stats.py:111] WalkerStatic: Step: 12780000. Time Elapsed: 11714.508 s Mean Reward: 1166.706. Std of Reward: 756.207. Training.
2020-06-30 22:42:59 INFO [stats.py:111] WalkerStatic: Step: 12810000. Time Elapsed: 11751.993 s Mean Reward: 1489.695. Std of Reward: 547.388. Training.
2020-06-30 22:43:22 INFO [stats.py:111] WalkerStatic: Step: 12840000. Time Elapsed: 11775.310 s Mean Reward: 1476.017. Std of Reward: 564.137. Training.
2020-06-30 22:43:49 INFO [stats.py:111] WalkerStatic: Step: 12870000. Time Elapsed: 11802.589 s Mean Reward: 1325.821. Std of Reward: 679.624. Training.
2020-06-30 22:44:22 INFO [stats.py:111] WalkerStatic: Step: 12900000. Time Elapsed: 11835.135 s Mean Reward: 1378.763. Std of Reward: 590.459. Training.
2020-06-30 22:44:43 INFO [stats.py:111] WalkerStatic: Step: 12930000. Time Elapsed: 11856.139 s Mean Reward: 1423.182. Std of Reward: 555.950. Training.
2020-06-30 22:45:11 INFO [stats.py:111] WalkerStatic: Step: 12960000. Time Elapsed: 11884.095 s Mean Reward: 1196.745. Std of Reward: 744.122. Training.
2020-06-30 22:45:40 INFO [stats.py:111] WalkerStatic: Step: 12990000. Time Elapsed: 11913.129 s Mean Reward: 1329.009. Std of Reward: 632.059. Training.
2020-06-30 22:45:51 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 22:46:06 INFO [stats.py:111] WalkerStatic: Step: 13020000. Time Elapsed: 11939.496 s Mean Reward: 1348.211. Std of Reward: 680.374. Training.
2020-06-30 22:46:38 INFO [stats.py:111] WalkerStatic: Step: 13050000. Time Elapsed: 11970.699 s Mean Reward: 1426.208. Std of Reward: 588.717. Training.
2020-06-30 22:47:08 INFO [stats.py:111] WalkerStatic: Step: 13080000. Time Elapsed: 12001.035 s Mean Reward: 1231.722. Std of Reward: 757.799. Training.
2020-06-30 22:47:32 INFO [stats.py:111] WalkerStatic: Step: 13110000. Time Elapsed: 12025.522 s Mean Reward: 1526.864. Std of Reward: 491.280. Training.
2020-06-30 22:48:01 INFO [stats.py:111] WalkerStatic: Step: 13140000. Time Elapsed: 12054.400 s Mean Reward: 1347.069. Std of Reward: 663.483. Training.
2020-06-30 22:48:28 INFO [stats.py:111] WalkerStatic: Step: 13170000. Time Elapsed: 12080.918 s Mean Reward: 1373.006. Std of Reward: 626.000. Training.
2020-06-30 22:48:56 INFO [stats.py:111] WalkerStatic: Step: 13200000. Time Elapsed: 12109.566 s Mean Reward: 1312.450. Std of Reward: 689.553. Training.
2020-06-30 22:49:29 INFO [stats.py:111] WalkerStatic: Step: 13230000. Time Elapsed: 12141.739 s Mean Reward: 1465.898. Std of Reward: 569.284. Training.
2020-06-30 22:49:58 INFO [stats.py:111] WalkerStatic: Step: 13260000. Time Elapsed: 12171.318 s Mean Reward: 1636.905. Std of Reward: 352.475. Training.
2020-06-30 22:50:29 INFO [stats.py:111] WalkerStatic: Step: 13290000. Time Elapsed: 12201.701 s Mean Reward: 1387.284. Std of Reward: 672.510. Training.
2020-06-30 22:50:51 INFO [stats.py:111] WalkerStatic: Step: 13320000. Time Elapsed: 12224.204 s Mean Reward: 1437.847. Std of Reward: 622.796. Training.
2020-06-30 22:51:20 INFO [stats.py:111] WalkerStatic: Step: 13350000. Time Elapsed: 12252.995 s Mean Reward: 1484.142. Std of Reward: 542.917. Training.
2020-06-30 22:51:50 INFO [stats.py:111] WalkerStatic: Step: 13380000. Time Elapsed: 12282.636 s Mean Reward: 1320.893. Std of Reward: 697.935. Training.
2020-06-30 22:52:14 INFO [stats.py:111] WalkerStatic: Step: 13410000. Time Elapsed: 12306.825 s Mean Reward: 1531.120. Std of Reward: 532.695. Training.
2020-06-30 22:52:47 INFO [stats.py:111] WalkerStatic: Step: 13440000. Time Elapsed: 12340.211 s Mean Reward: 1526.508. Std of Reward: 535.060. Training.
2020-06-30 22:53:08 INFO [stats.py:111] WalkerStatic: Step: 13470000. Time Elapsed: 12361.462 s Mean Reward: 1400.324. Std of Reward: 674.158. Training.
2020-06-30 22:53:43 INFO [stats.py:111] WalkerStatic: Step: 13500000. Time Elapsed: 12396.317 s Mean Reward: 1386.591. Std of Reward: 712.121. Training.
2020-06-30 22:53:43 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 22:54:11 INFO [stats.py:111] WalkerStatic: Step: 13530000. Time Elapsed: 12423.861 s Mean Reward: 1436.538. Std of Reward: 652.193. Training.
2020-06-30 22:54:33 INFO [stats.py:111] WalkerStatic: Step: 13560000. Time Elapsed: 12445.827 s Mean Reward: 1561.328. Std of Reward: 519.548. Training.
2020-06-30 22:55:06 INFO [stats.py:111] WalkerStatic: Step: 13590000. Time Elapsed: 12478.859 s Mean Reward: 1318.909. Std of Reward: 661.910. Training.
2020-06-30 22:55:33 INFO [stats.py:111] WalkerStatic: Step: 13620000. Time Elapsed: 12506.232 s Mean Reward: 1464.109. Std of Reward: 629.595. Training.
2020-06-30 22:56:00 INFO [stats.py:111] WalkerStatic: Step: 13650000. Time Elapsed: 12533.013 s Mean Reward: 1476.173. Std of Reward: 622.271. Training.
2020-06-30 22:56:30 INFO [stats.py:111] WalkerStatic: Step: 13680000. Time Elapsed: 12562.659 s Mean Reward: 1255.402. Std of Reward: 759.840. Training.
2020-06-30 22:56:57 INFO [stats.py:111] WalkerStatic: Step: 13710000. Time Elapsed: 12590.159 s Mean Reward: 1342.723. Std of Reward: 669.837. Training.
2020-06-30 22:57:26 INFO [stats.py:111] WalkerStatic: Step: 13740000. Time Elapsed: 12618.614 s Mean Reward: 1375.965. Std of Reward: 696.649. Training.
2020-06-30 22:57:54 INFO [stats.py:111] WalkerStatic: Step: 13770000. Time Elapsed: 12647.520 s Mean Reward: 1374.153. Std of Reward: 666.308. Training.
2020-06-30 22:58:18 INFO [stats.py:111] WalkerStatic: Step: 13800000. Time Elapsed: 12670.917 s Mean Reward: 1369.986. Std of Reward: 672.029. Training.
2020-06-30 22:58:51 INFO [stats.py:111] WalkerStatic: Step: 13830000. Time Elapsed: 12703.622 s Mean Reward: 1321.347. Std of Reward: 697.978. Training.
2020-06-30 22:59:22 INFO [stats.py:111] WalkerStatic: Step: 13860000. Time Elapsed: 12735.316 s Mean Reward: 1520.011. Std of Reward: 569.916. Training.
2020-06-30 22:59:46 INFO [stats.py:111] WalkerStatic: Step: 13890000. Time Elapsed: 12758.661 s Mean Reward: 1426.610. Std of Reward: 675.988. Training.
2020-06-30 23:00:15 INFO [stats.py:111] WalkerStatic: Step: 13920000. Time Elapsed: 12788.091 s Mean Reward: 1372.867. Std of Reward: 673.598. Training.
2020-06-30 23:00:44 INFO [stats.py:111] WalkerStatic: Step: 13950000. Time Elapsed: 12816.804 s Mean Reward: 1383.417. Std of Reward: 704.463. Training.
2020-06-30 23:01:08 INFO [stats.py:111] WalkerStatic: Step: 13980000. Time Elapsed: 12840.825 s Mean Reward: 1410.249. Std of Reward: 688.628. Training.
2020-06-30 23:01:31 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 23:01:40 INFO [stats.py:111] WalkerStatic: Step: 14010000. Time Elapsed: 12873.292 s Mean Reward: 1625.694. Std of Reward: 407.440. Training.
2020-06-30 23:02:07 INFO [stats.py:111] WalkerStatic: Step: 14040000. Time Elapsed: 12900.347 s Mean Reward: 1356.980. Std of Reward: 721.329. Training.
2020-06-30 23:02:34 INFO [stats.py:111] WalkerStatic: Step: 14070000. Time Elapsed: 12927.092 s Mean Reward: 1391.844. Std of Reward: 660.002. Training.
2020-06-30 23:03:04 INFO [stats.py:111] WalkerStatic: Step: 14100000. Time Elapsed: 12957.363 s Mean Reward: 1508.148. Std of Reward: 591.735. Training.
2020-06-30 23:03:33 INFO [stats.py:111] WalkerStatic: Step: 14130000. Time Elapsed: 12986.549 s Mean Reward: 1464.366. Std of Reward: 663.463. Training.
2020-06-30 23:04:01 INFO [stats.py:111] WalkerStatic: Step: 14160000. Time Elapsed: 13014.139 s Mean Reward: 1538.838. Std of Reward: 490.187. Training.
2020-06-30 23:04:25 INFO [stats.py:111] WalkerStatic: Step: 14190000. Time Elapsed: 13037.993 s Mean Reward: 1373.968. Std of Reward: 706.604. Training.
2020-06-30 23:05:05 INFO [stats.py:111] WalkerStatic: Step: 14220000. Time Elapsed: 13077.738 s Mean Reward: 1423.362. Std of Reward: 688.269. Training.
2020-06-30 23:05:23 INFO [stats.py:111] WalkerStatic: Step: 14250000. Time Elapsed: 13096.293 s Mean Reward: 1435.598. Std of Reward: 684.793. Training.
2020-06-30 23:05:52 INFO [stats.py:111] WalkerStatic: Step: 14280000. Time Elapsed: 13124.957 s Mean Reward: 1390.566. Std of Reward: 717.112. Training.
2020-06-30 23:06:26 INFO [stats.py:111] WalkerStatic: Step: 14310000. Time Elapsed: 13159.211 s Mean Reward: 1528.610. Std of Reward: 603.338. Training.
2020-06-30 23:06:49 INFO [stats.py:111] WalkerStatic: Step: 14340000. Time Elapsed: 13182.539 s Mean Reward: 1358.018. Std of Reward: 716.287. Training.
2020-06-30 23:07:14 INFO [stats.py:111] WalkerStatic: Step: 14370000. Time Elapsed: 13207.424 s Mean Reward: 1451.670. Std of Reward: 656.094. Training.
2020-06-30 23:07:56 INFO [stats.py:111] WalkerStatic: Step: 14400000. Time Elapsed: 13249.191 s Mean Reward: 1608.958. Std of Reward: 469.130. Training.
2020-06-30 23:08:13 INFO [stats.py:111] WalkerStatic: Step: 14430000. Time Elapsed: 13266.062 s Mean Reward: 1576.854. Std of Reward: 544.477. Training.
2020-06-30 23:08:40 INFO [stats.py:111] WalkerStatic: Step: 14460000. Time Elapsed: 13293.457 s Mean Reward: 1711.421. Std of Reward: 324.921. Training.
2020-06-30 23:09:15 INFO [stats.py:111] WalkerStatic: Step: 14490000. Time Elapsed: 13328.288 s Mean Reward: 1504.487. Std of Reward: 599.734. Training.
2020-06-30 23:09:21 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 23:09:36 INFO [stats.py:111] WalkerStatic: Step: 14520000. Time Elapsed: 13349.523 s Mean Reward: 1439.802. Std of Reward: 701.831. Training.
2020-06-30 23:10:12 INFO [stats.py:111] WalkerStatic: Step: 14550000. Time Elapsed: 13384.933 s Mean Reward: 1429.246. Std of Reward: 694.906. Training.
2020-06-30 23:10:40 INFO [stats.py:111] WalkerStatic: Step: 14580000. Time Elapsed: 13412.864 s Mean Reward: 1543.612. Std of Reward: 592.021. Training.
2020-06-30 23:11:10 INFO [stats.py:111] WalkerStatic: Step: 14610000. Time Elapsed: 13442.787 s Mean Reward: 1601.804. Std of Reward: 498.704. Training.
2020-06-30 23:11:32 INFO [stats.py:111] WalkerStatic: Step: 14640000. Time Elapsed: 13464.828 s Mean Reward: 1429.478. Std of Reward: 682.872. Training.
2020-06-30 23:12:05 INFO [stats.py:111] WalkerStatic: Step: 14670000. Time Elapsed: 13497.616 s Mean Reward: 1357.800. Std of Reward: 708.474. Training.
2020-06-30 23:12:30 INFO [stats.py:111] WalkerStatic: Step: 14700000. Time Elapsed: 13523.316 s Mean Reward: 1517.970. Std of Reward: 628.402. Training.
2020-06-30 23:12:57 INFO [stats.py:111] WalkerStatic: Step: 14730000. Time Elapsed: 13549.622 s Mean Reward: 1507.052. Std of Reward: 644.882. Training.
2020-06-30 23:13:28 INFO [stats.py:111] WalkerStatic: Step: 14760000. Time Elapsed: 13580.937 s Mean Reward: 1599.471. Std of Reward: 486.803. Training.
2020-06-30 23:13:53 INFO [stats.py:111] WalkerStatic: Step: 14790000. Time Elapsed: 13606.028 s Mean Reward: 1410.003. Std of Reward: 705.282. Training.
2020-06-30 23:14:25 INFO [stats.py:111] WalkerStatic: Step: 14820000. Time Elapsed: 13638.341 s Mean Reward: 1464.719. Std of Reward: 679.789. Training.
2020-06-30 23:14:54 INFO [stats.py:111] WalkerStatic: Step: 14850000. Time Elapsed: 13666.839 s Mean Reward: 1669.683. Std of Reward: 434.323. Training.
2020-06-30 23:15:15 INFO [stats.py:111] WalkerStatic: Step: 14880000. Time Elapsed: 13687.907 s Mean Reward: 1600.610. Std of Reward: 584.006. Training.
2020-06-30 23:15:52 INFO [stats.py:111] WalkerStatic: Step: 14910000. Time Elapsed: 13725.605 s Mean Reward: 1435.785. Std of Reward: 674.327. Training.
2020-06-30 23:16:18 INFO [stats.py:111] WalkerStatic: Step: 14940000. Time Elapsed: 13750.703 s Mean Reward: 1633.326. Std of Reward: 504.135. Training.
2020-06-30 23:16:42 INFO [stats.py:111] WalkerStatic: Step: 14970000. Time Elapsed: 13775.063 s Mean Reward: 1388.771. Std of Reward: 750.246. Training.
2020-06-30 23:17:15 INFO [stats.py:111] WalkerStatic: Step: 15000000. Time Elapsed: 13807.970 s Mean Reward: 1581.420. Std of Reward: 601.645. Training.
2020-06-30 23:17:15 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 23:17:44 INFO [stats.py:111] WalkerStatic: Step: 15030000. Time Elapsed: 13836.644 s Mean Reward: 1508.107. Std of Reward: 649.380. Training.
2020-06-30 23:18:06 INFO [stats.py:111] WalkerStatic: Step: 15060000. Time Elapsed: 13859.250 s Mean Reward: 1549.026. Std of Reward: 627.569. Training.
2020-06-30 23:18:37 INFO [stats.py:111] WalkerStatic: Step: 15090000. Time Elapsed: 13890.531 s Mean Reward: 1525.977. Std of Reward: 591.287. Training.
2020-06-30 23:19:06 INFO [stats.py:111] WalkerStatic: Step: 15120000. Time Elapsed: 13919.519 s Mean Reward: 1592.411. Std of Reward: 584.338. Training.
2020-06-30 23:19:31 INFO [stats.py:111] WalkerStatic: Step: 15150000. Time Elapsed: 13944.452 s Mean Reward: 1696.619. Std of Reward: 434.426. Training.
2020-06-30 23:20:07 INFO [stats.py:111] WalkerStatic: Step: 15180000. Time Elapsed: 13979.912 s Mean Reward: 1408.948. Std of Reward: 735.683. Training.
2020-06-30 23:20:29 INFO [stats.py:111] WalkerStatic: Step: 15210000. Time Elapsed: 14002.455 s Mean Reward: 1709.948. Std of Reward: 444.279. Training.
2020-06-30 23:21:04 INFO [stats.py:111] WalkerStatic: Step: 15240000. Time Elapsed: 14037.192 s Mean Reward: 1547.174. Std of Reward: 615.505. Training.
2020-06-30 23:21:30 INFO [stats.py:111] WalkerStatic: Step: 15270000. Time Elapsed: 14062.722 s Mean Reward: 1704.253. Std of Reward: 452.964. Training.
2020-06-30 23:21:53 INFO [stats.py:111] WalkerStatic: Step: 15300000. Time Elapsed: 14086.416 s Mean Reward: 1648.783. Std of Reward: 430.018. Training.
2020-06-30 23:22:25 INFO [stats.py:111] WalkerStatic: Step: 15330000. Time Elapsed: 14118.249 s Mean Reward: 1494.318. Std of Reward: 659.352. Training.
2020-06-30 23:22:54 INFO [stats.py:111] WalkerStatic: Step: 15360000. Time Elapsed: 14147.567 s Mean Reward: 1520.376. Std of Reward: 681.337. Training.
2020-06-30 23:23:21 INFO [stats.py:111] WalkerStatic: Step: 15390000. Time Elapsed: 14174.131 s Mean Reward: 1383.976. Std of Reward: 774.402. Training.
2020-06-30 23:23:52 INFO [stats.py:111] WalkerStatic: Step: 15420000. Time Elapsed: 14205.109 s Mean Reward: 1548.868. Std of Reward: 576.869. Training.
2020-06-30 23:24:23 INFO [stats.py:111] WalkerStatic: Step: 15450000. Time Elapsed: 14236.311 s Mean Reward: 1672.415. Std of Reward: 455.143. Training.
2020-06-30 23:24:44 INFO [stats.py:111] WalkerStatic: Step: 15480000. Time Elapsed: 14256.639 s Mean Reward: 1543.749. Std of Reward: 655.750. Training.
2020-06-30 23:25:04 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 23:25:12 INFO [stats.py:111] WalkerStatic: Step: 15510000. Time Elapsed: 14285.160 s Mean Reward: 1480.699. Std of Reward: 681.469. Training.
2020-06-30 23:25:46 INFO [stats.py:111] WalkerStatic: Step: 15540000. Time Elapsed: 14318.991 s Mean Reward: 1584.875. Std of Reward: 618.795. Training.
2020-06-30 23:26:10 INFO [stats.py:111] WalkerStatic: Step: 15570000. Time Elapsed: 14343.296 s Mean Reward: 1418.800. Std of Reward: 763.862. Training.
2020-06-30 23:26:42 INFO [stats.py:111] WalkerStatic: Step: 15600000. Time Elapsed: 14375.334 s Mean Reward: 1568.627. Std of Reward: 564.415. Training.
2020-06-30 23:27:08 INFO [stats.py:111] WalkerStatic: Step: 15630000. Time Elapsed: 14400.705 s Mean Reward: 1458.795. Std of Reward: 713.920. Training.
2020-06-30 23:27:32 INFO [stats.py:111] WalkerStatic: Step: 15660000. Time Elapsed: 14425.505 s Mean Reward: 1593.866. Std of Reward: 622.848. Training.
2020-06-30 23:28:07 INFO [stats.py:111] WalkerStatic: Step: 15690000. Time Elapsed: 14459.661 s Mean Reward: 1380.821. Std of Reward: 753.780. Training.
2020-06-30 23:28:29 INFO [stats.py:111] WalkerStatic: Step: 15720000. Time Elapsed: 14482.420 s Mean Reward: 1584.214. Std of Reward: 595.240. Training.
2020-06-30 23:29:00 INFO [stats.py:111] WalkerStatic: Step: 15750000. Time Elapsed: 14513.202 s Mean Reward: 1470.143. Std of Reward: 686.427. Training.
2020-06-30 23:29:31 INFO [stats.py:111] WalkerStatic: Step: 15780000. Time Elapsed: 14543.870 s Mean Reward: 1589.058. Std of Reward: 562.533. Training.
2020-06-30 23:29:56 INFO [stats.py:111] WalkerStatic: Step: 15810000. Time Elapsed: 14569.358 s Mean Reward: 1652.359. Std of Reward: 569.430. Training.
2020-06-30 23:30:25 INFO [stats.py:111] WalkerStatic: Step: 15840000. Time Elapsed: 14597.766 s Mean Reward: 1608.029. Std of Reward: 574.493. Training.
2020-06-30 23:30:55 INFO [stats.py:111] WalkerStatic: Step: 15870000. Time Elapsed: 14628.145 s Mean Reward: 1571.394. Std of Reward: 557.088. Training.
2020-06-30 23:31:22 INFO [stats.py:111] WalkerStatic: Step: 15900000. Time Elapsed: 14655.399 s Mean Reward: 1654.974. Std of Reward: 543.366. Training.
2020-06-30 23:31:50 INFO [stats.py:111] WalkerStatic: Step: 15930000. Time Elapsed: 14683.194 s Mean Reward: 1731.055. Std of Reward: 418.586. Training.
2020-06-30 23:32:19 INFO [stats.py:111] WalkerStatic: Step: 15960000. Time Elapsed: 14712.046 s Mean Reward: 1463.218. Std of Reward: 701.716. Training.
2020-06-30 23:32:47 INFO [stats.py:111] WalkerStatic: Step: 15990000. Time Elapsed: 14740.359 s Mean Reward: 1501.521. Std of Reward: 697.691. Training.
2020-06-30 23:32:59 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 23:33:19 INFO [stats.py:111] WalkerStatic: Step: 16020000. Time Elapsed: 14772.123 s Mean Reward: 1503.048. Std of Reward: 723.671. Training.
2020-06-30 23:33:44 INFO [stats.py:111] WalkerStatic: Step: 16050000. Time Elapsed: 14797.360 s Mean Reward: 1460.302. Std of Reward: 717.668. Training.
2020-06-30 23:34:11 INFO [stats.py:111] WalkerStatic: Step: 16080000. Time Elapsed: 14823.912 s Mean Reward: 1720.391. Std of Reward: 400.102. Training.
2020-06-30 23:34:44 INFO [stats.py:111] WalkerStatic: Step: 16110000. Time Elapsed: 14857.265 s Mean Reward: 1520.571. Std of Reward: 676.017. Training.
2020-06-30 23:35:13 INFO [stats.py:111] WalkerStatic: Step: 16140000. Time Elapsed: 14885.916 s Mean Reward: 1565.257. Std of Reward: 662.004. Training.
2020-06-30 23:35:41 INFO [stats.py:111] WalkerStatic: Step: 16170000. Time Elapsed: 14914.032 s Mean Reward: 1672.173. Std of Reward: 543.264. Training.
2020-06-30 23:36:06 INFO [stats.py:111] WalkerStatic: Step: 16200000. Time Elapsed: 14938.633 s Mean Reward: 1541.538. Std of Reward: 676.984. Training.
2020-06-30 23:36:41 INFO [stats.py:111] WalkerStatic: Step: 16230000. Time Elapsed: 14974.080 s Mean Reward: 1678.384. Std of Reward: 495.662. Training.
2020-06-30 23:37:03 INFO [stats.py:111] WalkerStatic: Step: 16260000. Time Elapsed: 14996.418 s Mean Reward: 1429.583. Std of Reward: 766.427. Training.
2020-06-30 23:37:34 INFO [stats.py:111] WalkerStatic: Step: 16290000. Time Elapsed: 15027.201 s Mean Reward: 1521.146. Std of Reward: 693.312. Training.
2020-06-30 23:38:02 INFO [stats.py:111] WalkerStatic: Step: 16320000. Time Elapsed: 15055.347 s Mean Reward: 1439.150. Std of Reward: 728.350. Training.
2020-06-30 23:38:29 INFO [stats.py:111] WalkerStatic: Step: 16350000. Time Elapsed: 15082.359 s Mean Reward: 1473.283. Std of Reward: 727.308. Training.
2020-06-30 23:39:01 INFO [stats.py:111] WalkerStatic: Step: 16380000. Time Elapsed: 15114.043 s Mean Reward: 1766.535. Std of Reward: 385.780. Training.
2020-06-30 23:39:27 INFO [stats.py:111] WalkerStatic: Step: 16410000. Time Elapsed: 15139.901 s Mean Reward: 1753.963. Std of Reward: 452.657. Training.
2020-06-30 23:39:58 INFO [stats.py:111] WalkerStatic: Step: 16440000. Time Elapsed: 15171.411 s Mean Reward: 1601.287. Std of Reward: 637.781. Training.
2020-06-30 23:40:24 INFO [stats.py:111] WalkerStatic: Step: 16470000. Time Elapsed: 15196.648 s Mean Reward: 1629.817. Std of Reward: 505.323. Training.
2020-06-30 23:40:51 INFO [stats.py:111] WalkerStatic: Step: 16500000. Time Elapsed: 15224.268 s Mean Reward: 1616.576. Std of Reward: 573.205. Training.
2020-06-30 23:40:51 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 23:41:24 INFO [stats.py:111] WalkerStatic: Step: 16530000. Time Elapsed: 15256.916 s Mean Reward: 1652.540. Std of Reward: 447.802. Training.
2020-06-30 23:41:45 INFO [stats.py:111] WalkerStatic: Step: 16560000. Time Elapsed: 15278.301 s Mean Reward: 1507.726. Std of Reward: 696.080. Training.
2020-06-30 23:42:19 INFO [stats.py:111] WalkerStatic: Step: 16590000. Time Elapsed: 15312.415 s Mean Reward: 1555.998. Std of Reward: 590.109. Training.
2020-06-30 23:42:47 INFO [stats.py:111] WalkerStatic: Step: 16620000. Time Elapsed: 15339.636 s Mean Reward: 1825.314. Std of Reward: 164.940. Training.
2020-06-30 23:43:19 INFO [stats.py:111] WalkerStatic: Step: 16650000. Time Elapsed: 15371.930 s Mean Reward: 1591.344. Std of Reward: 648.942. Training.
2020-06-30 23:43:41 INFO [stats.py:111] WalkerStatic: Step: 16680000. Time Elapsed: 15394.355 s Mean Reward: 1627.383. Std of Reward: 567.681. Training.
2020-06-30 23:44:11 INFO [stats.py:111] WalkerStatic: Step: 16710000. Time Elapsed: 15423.652 s Mean Reward: 1634.761. Std of Reward: 541.901. Training.
2020-06-30 23:44:47 INFO [stats.py:111] WalkerStatic: Step: 16740000. Time Elapsed: 15459.710 s Mean Reward: 1740.534. Std of Reward: 426.485. Training.
2020-06-30 23:45:06 INFO [stats.py:111] WalkerStatic: Step: 16770000. Time Elapsed: 15478.837 s Mean Reward: 1630.544. Std of Reward: 624.824. Training.
2020-06-30 23:45:38 INFO [stats.py:111] WalkerStatic: Step: 16800000. Time Elapsed: 15511.465 s Mean Reward: 1757.732. Std of Reward: 318.742. Training.
2020-06-30 23:46:10 INFO [stats.py:111] WalkerStatic: Step: 16830000. Time Elapsed: 15542.663 s Mean Reward: 1838.819. Std of Reward: 335.897. Training.
2020-06-30 23:46:35 INFO [stats.py:111] WalkerStatic: Step: 16860000. Time Elapsed: 15567.704 s Mean Reward: 1662.104. Std of Reward: 562.666. Training.
2020-06-30 23:47:03 INFO [stats.py:111] WalkerStatic: Step: 16890000. Time Elapsed: 15596.314 s Mean Reward: 1446.075. Std of Reward: 742.389. Training.
2020-06-30 23:47:33 INFO [stats.py:111] WalkerStatic: Step: 16920000. Time Elapsed: 15626.045 s Mean Reward: 1449.493. Std of Reward: 745.169. Training.
2020-06-30 23:48:01 INFO [stats.py:111] WalkerStatic: Step: 16950000. Time Elapsed: 15654.254 s Mean Reward: 1351.961. Std of Reward: 807.209. Training.
2020-06-30 23:48:26 INFO [stats.py:111] WalkerStatic: Step: 16980000. Time Elapsed: 15679.605 s Mean Reward: 1551.124. Std of Reward: 683.331. Training.
2020-06-30 23:48:46 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 23:49:01 INFO [stats.py:111] WalkerStatic: Step: 17010000. Time Elapsed: 15713.648 s Mean Reward: 1724.592. Std of Reward: 498.634. Training.
2020-06-30 23:49:27 INFO [stats.py:111] WalkerStatic: Step: 17040000. Time Elapsed: 15740.353 s Mean Reward: 1744.982. Std of Reward: 455.774. Training.
2020-06-30 23:49:55 INFO [stats.py:111] WalkerStatic: Step: 17070000. Time Elapsed: 15768.040 s Mean Reward: 1440.228. Std of Reward: 734.207. Training.
2020-06-30 23:50:24 INFO [stats.py:111] WalkerStatic: Step: 17100000. Time Elapsed: 15796.929 s Mean Reward: 1586.317. Std of Reward: 593.505. Training.
2020-06-30 23:50:55 INFO [stats.py:111] WalkerStatic: Step: 17130000. Time Elapsed: 15828.120 s Mean Reward: 1713.612. Std of Reward: 499.915. Training.
2020-06-30 23:51:24 INFO [stats.py:111] WalkerStatic: Step: 17160000. Time Elapsed: 15856.860 s Mean Reward: 1435.882. Std of Reward: 788.577. Training.
2020-06-30 23:51:52 INFO [stats.py:111] WalkerStatic: Step: 17190000. Time Elapsed: 15885.538 s Mean Reward: 1695.952. Std of Reward: 546.726. Training.
2020-06-30 23:52:19 INFO [stats.py:111] WalkerStatic: Step: 17220000. Time Elapsed: 15911.855 s Mean Reward: 1693.042. Std of Reward: 542.060. Training.
2020-06-30 23:52:49 INFO [stats.py:111] WalkerStatic: Step: 17250000. Time Elapsed: 15941.706 s Mean Reward: 1597.625. Std of Reward: 595.277. Training.
2020-06-30 23:53:21 INFO [stats.py:111] WalkerStatic: Step: 17280000. Time Elapsed: 15973.926 s Mean Reward: 1481.971. Std of Reward: 748.128. Training.
2020-06-30 23:53:44 INFO [stats.py:111] WalkerStatic: Step: 17310000. Time Elapsed: 15997.415 s Mean Reward: 1688.731. Std of Reward: 536.003. Training.
2020-06-30 23:54:12 INFO [stats.py:111] WalkerStatic: Step: 17340000. Time Elapsed: 16024.862 s Mean Reward: 1575.845. Std of Reward: 686.216. Training.
2020-06-30 23:54:41 INFO [stats.py:111] WalkerStatic: Step: 17370000. Time Elapsed: 16054.045 s Mean Reward: 1445.053. Std of Reward: 725.101. Training.
2020-06-30 23:55:08 INFO [stats.py:111] WalkerStatic: Step: 17400000. Time Elapsed: 16081.569 s Mean Reward: 1666.694. Std of Reward: 610.830. Training.
2020-06-30 23:55:41 INFO [stats.py:111] WalkerStatic: Step: 17430000. Time Elapsed: 16113.815 s Mean Reward: 1793.459. Std of Reward: 374.053. Training.
2020-06-30 23:56:06 INFO [stats.py:111] WalkerStatic: Step: 17460000. Time Elapsed: 16138.948 s Mean Reward: 1724.042. Std of Reward: 540.610. Training.
2020-06-30 23:56:40 INFO [stats.py:111] WalkerStatic: Step: 17490000. Time Elapsed: 16173.018 s Mean Reward: 1732.948. Std of Reward: 547.089. Training.
2020-06-30 23:56:47 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-06-30 23:57:03 INFO [stats.py:111] WalkerStatic: Step: 17520000. Time Elapsed: 16196.321 s Mean Reward: 1764.580. Std of Reward: 474.662. Training.
2020-06-30 23:57:29 INFO [stats.py:111] WalkerStatic: Step: 17550000. Time Elapsed: 16222.113 s Mean Reward: 1613.557. Std of Reward: 637.704. Training.
2020-06-30 23:58:04 INFO [stats.py:111] WalkerStatic: Step: 17580000. Time Elapsed: 16256.991 s Mean Reward: 1757.004. Std of Reward: 470.767. Training.
2020-06-30 23:58:27 INFO [stats.py:111] WalkerStatic: Step: 17610000. Time Elapsed: 16280.054 s Mean Reward: 1720.958. Std of Reward: 554.897. Training.
2020-06-30 23:59:00 INFO [stats.py:111] WalkerStatic: Step: 17640000. Time Elapsed: 16313.211 s Mean Reward: 1570.404. Std of Reward: 675.361. Training.
2020-06-30 23:59:27 INFO [stats.py:111] WalkerStatic: Step: 17670000. Time Elapsed: 16340.215 s Mean Reward: 1725.833. Std of Reward: 513.170. Training.
2020-06-30 23:59:56 INFO [stats.py:111] WalkerStatic: Step: 17700000. Time Elapsed: 16368.739 s Mean Reward: 1682.692. Std of Reward: 579.002. Training.
2020-07-01 00:00:25 INFO [stats.py:111] WalkerStatic: Step: 17730000. Time Elapsed: 16398.123 s Mean Reward: 1587.789. Std of Reward: 584.483. Training.
2020-07-01 00:00:57 INFO [stats.py:111] WalkerStatic: Step: 17760000. Time Elapsed: 16430.109 s Mean Reward: 1571.020. Std of Reward: 631.367. Training.
2020-07-01 00:01:20 INFO [stats.py:111] WalkerStatic: Step: 17790000. Time Elapsed: 16452.954 s Mean Reward: 1526.773. Std of Reward: 724.304. Training.
2020-07-01 00:01:51 INFO [stats.py:111] WalkerStatic: Step: 17820000. Time Elapsed: 16483.655 s Mean Reward: 1698.627. Std of Reward: 583.733. Training.
2020-07-01 00:02:19 INFO [stats.py:111] WalkerStatic: Step: 17850000. Time Elapsed: 16512.540 s Mean Reward: 1732.540. Std of Reward: 501.208. Training.
2020-07-01 00:02:49 INFO [stats.py:111] WalkerStatic: Step: 17880000. Time Elapsed: 16542.595 s Mean Reward: 1801.683. Std of Reward: 441.186. Training.
2020-07-01 00:03:18 INFO [stats.py:111] WalkerStatic: Step: 17910000. Time Elapsed: 16571.005 s Mean Reward: 1717.470. Std of Reward: 527.074. Training.
2020-07-01 00:03:43 INFO [stats.py:111] WalkerStatic: Step: 17940000. Time Elapsed: 16596.602 s Mean Reward: 1732.511. Std of Reward: 520.227. Training.
2020-07-01 00:04:15 INFO [stats.py:111] WalkerStatic: Step: 17970000. Time Elapsed: 16628.385 s Mean Reward: 1727.320. Std of Reward: 520.541. Training.
2020-07-01 00:04:42 INFO [stats.py:111] WalkerStatic: Step: 18000000. Time Elapsed: 16655.314 s Mean Reward: 1766.479. Std of Reward: 459.709. Training.
2020-07-01 00:04:42 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-07-01 00:05:09 INFO [stats.py:111] WalkerStatic: Step: 18030000. Time Elapsed: 16681.943 s Mean Reward: 1613.589. Std of Reward: 665.380. Training.
2020-07-01 00:05:42 INFO [stats.py:111] WalkerStatic: Step: 18060000. Time Elapsed: 16714.760 s Mean Reward: 1720.373. Std of Reward: 555.064. Training.
2020-07-01 00:06:08 INFO [stats.py:111] WalkerStatic: Step: 18090000. Time Elapsed: 16740.616 s Mean Reward: 1801.277. Std of Reward: 402.958. Training.
2020-07-01 00:06:38 INFO [stats.py:111] WalkerStatic: Step: 18120000. Time Elapsed: 16770.778 s Mean Reward: 1539.256. Std of Reward: 715.631. Training.
2020-07-01 00:07:07 INFO [stats.py:111] WalkerStatic: Step: 18150000. Time Elapsed: 16800.337 s Mean Reward: 1580.855. Std of Reward: 679.002. Training.
2020-07-01 00:07:39 INFO [stats.py:111] WalkerStatic: Step: 18180000. Time Elapsed: 16832.138 s Mean Reward: 1729.427. Std of Reward: 526.603. Training.
2020-07-01 00:08:03 INFO [stats.py:111] WalkerStatic: Step: 18210000. Time Elapsed: 16855.791 s Mean Reward: 1801.274. Std of Reward: 466.048. Training.
2020-07-01 00:08:33 INFO [stats.py:111] WalkerStatic: Step: 18240000. Time Elapsed: 16885.626 s Mean Reward: 1799.869. Std of Reward: 346.247. Training.
2020-07-01 00:08:58 INFO [stats.py:111] WalkerStatic: Step: 18270000. Time Elapsed: 16911.527 s Mean Reward: 1679.641. Std of Reward: 535.959. Training.
2020-07-01 00:09:30 INFO [stats.py:111] WalkerStatic: Step: 18300000. Time Elapsed: 16942.730 s Mean Reward: 1634.837. Std of Reward: 574.288. Training.
2020-07-01 00:09:59 INFO [stats.py:111] WalkerStatic: Step: 18330000. Time Elapsed: 16971.814 s Mean Reward: 1661.358. Std of Reward: 611.450. Training.
2020-07-01 00:10:23 INFO [stats.py:111] WalkerStatic: Step: 18360000. Time Elapsed: 16996.540 s Mean Reward: 1654.338. Std of Reward: 664.214. Training.
2020-07-01 00:11:00 INFO [stats.py:111] WalkerStatic: Step: 18390000. Time Elapsed: 17033.090 s Mean Reward: 1759.014. Std of Reward: 483.189. Training.
2020-07-01 00:11:21 INFO [stats.py:111] WalkerStatic: Step: 18420000. Time Elapsed: 17053.929 s Mean Reward: 1686.015. Std of Reward: 568.541. Training.
2020-07-01 00:11:53 INFO [stats.py:111] WalkerStatic: Step: 18450000. Time Elapsed: 17085.646 s Mean Reward: 1542.289. Std of Reward: 684.068. Training.
2020-07-01 00:12:23 INFO [stats.py:111] WalkerStatic: Step: 18480000. Time Elapsed: 17116.160 s Mean Reward: 1811.515. Std of Reward: 394.366. Training.
2020-07-01 00:12:39 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-07-01 00:12:46 INFO [stats.py:111] WalkerStatic: Step: 18510000. Time Elapsed: 17139.549 s Mean Reward: 1518.539. Std of Reward: 690.344. Training.
2020-07-01 00:13:21 INFO [stats.py:111] WalkerStatic: Step: 18540000. Time Elapsed: 17173.827 s Mean Reward: 1558.287. Std of Reward: 724.497. Training.
2020-07-01 00:13:48 INFO [stats.py:111] WalkerStatic: Step: 18570000. Time Elapsed: 17200.654 s Mean Reward: 1849.475. Std of Reward: 358.406. Training.
2020-07-01 00:14:14 INFO [stats.py:111] WalkerStatic: Step: 18600000. Time Elapsed: 17226.893 s Mean Reward: 1779.310. Std of Reward: 491.861. Training.
2020-07-01 00:14:44 INFO [stats.py:111] WalkerStatic: Step: 18630000. Time Elapsed: 17257.542 s Mean Reward: 1697.112. Std of Reward: 530.330. Training.
2020-07-01 00:15:11 INFO [stats.py:111] WalkerStatic: Step: 18660000. Time Elapsed: 17283.611 s Mean Reward: 1641.041. Std of Reward: 599.301. Training.
2020-07-01 00:15:45 INFO [stats.py:111] WalkerStatic: Step: 18690000. Time Elapsed: 17318.507 s Mean Reward: 1764.871. Std of Reward: 554.192. Training.
2020-07-01 00:16:09 INFO [stats.py:111] WalkerStatic: Step: 18720000. Time Elapsed: 17341.792 s Mean Reward: 1790.089. Std of Reward: 465.890. Training.
2020-07-01 00:16:36 INFO [stats.py:111] WalkerStatic: Step: 18750000. Time Elapsed: 17369.568 s Mean Reward: 1522.876. Std of Reward: 756.969. Training.
2020-07-01 00:17:09 INFO [stats.py:111] WalkerStatic: Step: 18780000. Time Elapsed: 17402.388 s Mean Reward: 1696.710. Std of Reward: 550.798. Training.
2020-07-01 00:17:36 INFO [stats.py:111] WalkerStatic: Step: 18810000. Time Elapsed: 17429.454 s Mean Reward: 1653.769. Std of Reward: 539.054. Training.
2020-07-01 00:18:06 INFO [stats.py:111] WalkerStatic: Step: 18840000. Time Elapsed: 17459.419 s Mean Reward: 1626.126. Std of Reward: 682.944. Training.
2020-07-01 00:18:33 INFO [stats.py:111] WalkerStatic: Step: 18870000. Time Elapsed: 17486.117 s Mean Reward: 1877.153. Std of Reward: 214.023. Training.
2020-07-01 00:18:59 INFO [stats.py:111] WalkerStatic: Step: 18900000. Time Elapsed: 17512.293 s Mean Reward: 1768.873. Std of Reward: 559.222. Training.
2020-07-01 00:19:31 INFO [stats.py:111] WalkerStatic: Step: 18930000. Time Elapsed: 17544.410 s Mean Reward: 1816.893. Std of Reward: 430.162. Training.
2020-07-01 00:20:01 INFO [stats.py:111] WalkerStatic: Step: 18960000. Time Elapsed: 17574.321 s Mean Reward: 1819.198. Std of Reward: 469.058. Training.
2020-07-01 00:20:25 INFO [stats.py:111] WalkerStatic: Step: 18990000. Time Elapsed: 17598.342 s Mean Reward: 1677.456. Std of Reward: 615.260. Training.
2020-07-01 00:20:38 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-07-01 00:21:00 INFO [stats.py:111] WalkerStatic: Step: 19020000. Time Elapsed: 17632.790 s Mean Reward: 1649.353. Std of Reward: 578.392. Training.
2020-07-01 00:21:24 INFO [stats.py:111] WalkerStatic: Step: 19050000. Time Elapsed: 17656.929 s Mean Reward: 1544.788. Std of Reward: 740.525. Training.
2020-07-01 00:21:50 INFO [stats.py:111] WalkerStatic: Step: 19080000. Time Elapsed: 17683.026 s Mean Reward: 1498.153. Std of Reward: 749.148. Training.
2020-07-01 00:22:25 INFO [stats.py:111] WalkerStatic: Step: 19110000. Time Elapsed: 17718.290 s Mean Reward: 1693.696. Std of Reward: 627.378. Training.
2020-07-01 00:22:48 INFO [stats.py:111] WalkerStatic: Step: 19140000. Time Elapsed: 17741.347 s Mean Reward: 1729.633. Std of Reward: 573.028. Training.
2020-07-01 00:23:22 INFO [stats.py:111] WalkerStatic: Step: 19170000. Time Elapsed: 17774.657 s Mean Reward: 1793.747. Std of Reward: 454.902. Training.
2020-07-01 00:23:47 INFO [stats.py:111] WalkerStatic: Step: 19200000. Time Elapsed: 17799.801 s Mean Reward: 1826.085. Std of Reward: 355.787. Training.
2020-07-01 00:24:20 INFO [stats.py:111] WalkerStatic: Step: 19230000. Time Elapsed: 17833.026 s Mean Reward: 1764.353. Std of Reward: 443.788. Training.
2020-07-01 00:24:49 INFO [stats.py:111] WalkerStatic: Step: 19260000. Time Elapsed: 17861.701 s Mean Reward: 1617.650. Std of Reward: 710.887. Training.
2020-07-01 00:25:11 INFO [stats.py:111] WalkerStatic: Step: 19290000. Time Elapsed: 17883.994 s Mean Reward: 1745.127. Std of Reward: 493.975. Training.
2020-07-01 00:25:42 INFO [stats.py:111] WalkerStatic: Step: 19320000. Time Elapsed: 17915.222 s Mean Reward: 1617.051. Std of Reward: 619.699. Training.
2020-07-01 00:26:11 INFO [stats.py:111] WalkerStatic: Step: 19350000. Time Elapsed: 17943.728 s Mean Reward: 1534.544. Std of Reward: 704.722. Training.
2020-07-01 00:26:44 INFO [stats.py:111] WalkerStatic: Step: 19380000. Time Elapsed: 17976.701 s Mean Reward: 1813.313. Std of Reward: 392.799. Training.
2020-07-01 00:27:06 INFO [stats.py:111] WalkerStatic: Step: 19410000. Time Elapsed: 17998.917 s Mean Reward: 1864.139. Std of Reward: 356.035. Training.
2020-07-01 00:27:39 INFO [stats.py:111] WalkerStatic: Step: 19440000. Time Elapsed: 18031.925 s Mean Reward: 1841.236. Std of Reward: 389.342. Training.
2020-07-01 00:28:09 INFO [stats.py:111] WalkerStatic: Step: 19470000. Time Elapsed: 18062.012 s Mean Reward: 1673.219. Std of Reward: 602.167. Training.
2020-07-01 00:28:32 INFO [stats.py:111] WalkerStatic: Step: 19500000. Time Elapsed: 18085.119 s Mean Reward: 1788.462. Std of Reward: 495.037. Training.
2020-07-01 00:28:32 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-07-01 00:29:04 INFO [stats.py:111] WalkerStatic: Step: 19530000. Time Elapsed: 18116.890 s Mean Reward: 1590.050. Std of Reward: 690.701. Training.
2020-07-01 00:29:33 INFO [stats.py:111] WalkerStatic: Step: 19560000. Time Elapsed: 18146.414 s Mean Reward: 1660.181. Std of Reward: 669.779. Training.
2020-07-01 00:30:01 INFO [stats.py:111] WalkerStatic: Step: 19590000. Time Elapsed: 18173.641 s Mean Reward: 1714.879. Std of Reward: 635.579. Training.
2020-07-01 00:30:29 INFO [stats.py:111] WalkerStatic: Step: 19620000. Time Elapsed: 18202.599 s Mean Reward: 1575.466. Std of Reward: 753.427. Training.
2020-07-01 00:31:01 INFO [stats.py:111] WalkerStatic: Step: 19650000. Time Elapsed: 18234.069 s Mean Reward: 1751.437. Std of Reward: 546.481. Training.
2020-07-01 00:31:26 INFO [stats.py:111] WalkerStatic: Step: 19680000. Time Elapsed: 18259.204 s Mean Reward: 1817.724. Std of Reward: 429.534. Training.
2020-07-01 00:31:53 INFO [stats.py:111] WalkerStatic: Step: 19710000. Time Elapsed: 18286.154 s Mean Reward: 1675.560. Std of Reward: 593.033. Training.
2020-07-01 00:32:27 INFO [stats.py:111] WalkerStatic: Step: 19740000. Time Elapsed: 18319.795 s Mean Reward: 1906.111. Std of Reward: 183.691. Training.
2020-07-01 00:32:54 INFO [stats.py:111] WalkerStatic: Step: 19770000. Time Elapsed: 18347.409 s Mean Reward: 1589.797. Std of Reward: 708.289. Training.
2020-07-01 00:33:24 INFO [stats.py:111] WalkerStatic: Step: 19800000. Time Elapsed: 18376.925 s Mean Reward: 1841.268. Std of Reward: 367.156. Training.
2020-07-01 00:33:52 INFO [stats.py:111] WalkerStatic: Step: 19830000. Time Elapsed: 18405.140 s Mean Reward: 1793.462. Std of Reward: 395.691. Training.
2020-07-01 00:34:22 INFO [stats.py:111] WalkerStatic: Step: 19860000. Time Elapsed: 18435.561 s Mean Reward: 1780.873. Std of Reward: 516.889. Training.
2020-07-01 00:34:48 INFO [stats.py:111] WalkerStatic: Step: 19890000. Time Elapsed: 18461.290 s Mean Reward: 1781.638. Std of Reward: 483.338. Training.
2020-07-01 00:35:15 INFO [stats.py:111] WalkerStatic: Step: 19920000. Time Elapsed: 18488.493 s Mean Reward: 1657.476. Std of Reward: 631.084. Training.
2020-07-01 00:35:46 INFO [stats.py:111] WalkerStatic: Step: 19950000. Time Elapsed: 18518.921 s Mean Reward: 1726.992. Std of Reward: 580.871. Training.
2020-07-01 00:36:12 INFO [stats.py:111] WalkerStatic: Step: 19980000. Time Elapsed: 18545.410 s Mean Reward: 1827.991. Std of Reward: 455.518. Training.
2020-07-01 00:36:33 INFO [rl_trainer.py:151] Checkpointing model for WalkerStatic.
2020-07-01 00:36:33 INFO [trainer_controller.py:101] Saved Model
2020-07-01 00:36:33 INFO [model_serialization.py:203] List of nodes to export for brain :WalkerStatic
2020-07-01 00:36:33 INFO [model_serialization.py:205] is_continuous_control
2020-07-01 00:36:33 INFO [model_serialization.py:205] trainer_major_version
2020-07-01 00:36:33 INFO [model_serialization.py:205] trainer_minor_version
2020-07-01 00:36:33 INFO [model_serialization.py:205] trainer_patch_version
2020-07-01 00:36:33 INFO [model_serialization.py:205] version_number
2020-07-01 00:36:33 INFO [model_serialization.py:205] memory_size
2020-07-01 00:36:33 INFO [model_serialization.py:205] action_output_shape
2020-07-01 00:36:33 INFO [model_serialization.py:205] action
2020-07-01 00:36:33 INFO [model_serialization.py:205] action_probs
Converting results/wst-ppo/WalkerStatic/frozen_graph_def.pb to results/wst-ppo/WalkerStatic.nn
IGNORED: Cast unknown layer
IGNORED: Shape unknown layer
IGNORED: StopGradient unknown layer
GLOBALS: 'is_continuous_control', 'trainer_major_version', 'trainer_minor_version', 'trainer_patch_version', 'version_number', 'memory_size', 'action_output_shape'
IN: 'vector_observation': [-1, 1, 1, 236] => 'sub_2'
OUT: 'action', 'action_probs'
DONE: wrote results/wst-ppo/WalkerStatic.nn file.
2020-07-01 00:36:33 INFO [model_serialization.py:83] Exported results/wst-ppo/WalkerStatic.nn file
debugger-agent: Unable to listen on 7
2020-07-01 00:36:34 INFO [environment.py:418] Environment shut down with return code 0.
debugger-agent: Unable to listen on 7
2020-07-01 00:36:34 INFO [environment.py:418] Environment shut down with return code 0.
debugger-agent: Unable to listen on 7
2020-07-01 00:36:34 INFO [environment.py:418] Environment shut down with return code 0.
2020-07-01 00:36:35 INFO [environment.py:418] Environment shut down with return code 0.
debugger-agent: Unable to listen on 7
2020-07-01 00:36:35 INFO [environment.py:418] Environment shut down with return code 0.
debugger-agent: Unable to listen on 7
2020-07-01 00:36:36 INFO [environment.py:418] Environment shut down with return code 0.
debugger-agent: Unable to listen on 7
2020-07-01 00:36:36 INFO [environment.py:418] Environment shut down with return code 0.
debugger-agent: Unable to listen on 7
2020-07-01 00:36:36 INFO [environment.py:418] Environment shut down with return code 0.