Merge branch 'self-play-mutex' into soccer-2v1

5 年前 · b42c9482
--- a/.yamato/com.unity.ml-agents-pack.yml
+++ b/.yamato/com.unity.ml-agents-pack.yml
    packages:
      paths:
        - "upm-ci~/packages/**/*"
+  triggers:
+    cancel_old_ci: true
--- a/.yamato/com.unity.ml-agents-test.yml
+++ b/.yamato/com.unity.ml-agents-test.yml
  dependencies:
    - .yamato/com.unity.ml-agents-pack.yml#pack
  triggers:
+    cancel_old_ci: true
    changes:
      only:
        - "com.unity.ml-agents/**"
--- a/.yamato/protobuf-generation-test.yml
+++ b/.yamato/protobuf-generation-test.yml
      echo "Apply the patch with the command 'git apply proto.patch'";                                  \
      git diff -- :/ ":(exclude,top)$CS_PROTO_PATH/*.meta" > artifacts/proto.patch; exit $GIT_ERR; }
  triggers:
+    cancel_old_ci: true
    changes:
      only:
        - "protobuf-definitions/**"
--- a/.yamato/standalone-build-test.yml
+++ b/.yamato/standalone-build-test.yml
    - pip install pyyaml
    - python -u -m ml-agents.tests.yamato.standalone_build_tests
  triggers:
+    cancel_old_ci: true
    changes:
      only:
        - "com.unity.ml-agents/**"
--- a/.yamato/training-int-tests.yml
+++ b/.yamato/training-int-tests.yml
    - pip install pyyaml
    - python -u -m ml-agents.tests.yamato.training_int_tests
  triggers:
+    cancel_old_ci: true
    changes:
      only:
        - "com.unity.ml-agents/**"
--- a/Project/Assets/ML-Agents/Examples/3DBall/TFModels/3DBall.nn
+++ b/Project/Assets/ML-Agents/Examples/3DBall/TFModels/3DBall.nn
--- a/Project/Assets/ML-Agents/Examples/3DBall/TFModels/3DBallHard.nn
+++ b/Project/Assets/ML-Agents/Examples/3DBall/TFModels/3DBallHard.nn
--- a/Project/Assets/ML-Agents/Examples/Basic/TFModels/Basic.nn
+++ b/Project/Assets/ML-Agents/Examples/Basic/TFModels/Basic.nn
-vector_observation����action_masks����action_probs/action_probsconcat_2/concatactionaction_output_shape������?action_output_shapememory_sizeversion_numberis_continuous_controladd_3/ySum/reduction_indicesadd_1/ymain_graph_0/hidden_0/BiasAdd�����?vector_observationmain_graph_0/hidden_0/kernel�main_graph_0/hidden_0/bias�main_graph_0/hidden_0/Mul2	�����?main_graph_0/hidden_0/BiasAdddense/MatMul�����?main_graph_0/hidden_0/Muldense/kernel�<dense/MatMul/patch:0�action_probs/action_probs2�����?dense/MatMul
strided_slice������?action_probs/action_probsstrided_slice_1������?action_masksSoftmax2�����?
strided_sliceadd_1d�����?Softmaxadd_1/yMulf�����?add_1strided_slice_1Sum������?MulSum/reduction_indicestruedivg�����?MulSumadd_3d�����?truedivadd_3/yLog_12r�����?add_3concat_2/concat2�����?Log_1action2�����?concat_2/concat@@@���3�?���3Ȥ꾒+�=�5پܢ׻�B��(������:Ȫ�>ɒ�>��J�����!��C}��,��>0R�S���'fA���a��Q>d�<"S�>��>u�Ծui`>r�9�0��Cc��>���5�>��w��j�=#����Z>�Q����~>C)�=)ѽ6=�|��=.0u>K)s>J�>�^->/��>Ǹ5>�8Q�!Q%�/�Y>y:�;����n�Uߣ�i�m���X����<Yn�����颾��s�*�Ѿ|2��L�>޷�=�O.�rN/>����nzz�SЋ>����)t��V�=�_>TY�>�C��]��˟<�ay=�ě>���-�Z��<�>� ��Ծ���=��˽���diǾzS߾�Ó�
2>��_>2�d>�>�	w���0��:���;�>�M˾#a�=a�>�%�����>+c�=w�f���7�
�U���=>��R=*�<˷��
a�=�پe\p�����]>O���A>�|3L=۽����J���m>,��*48>�D�>OKy�&��>����>�>��;����hp3���@��t�^>�==��'>���<�b���j
-���p��Y�>6�>|����>+�>Q���p1c>��Z�bI1=�I�>͋����������(_���=Z�j�G�U�^���Dp�<����o�>��H>�ъ=b=��=Q雾k�T>	��<���=�G�\P?��">x\#>�1S��Oӽ�����+)���=F��1]�?��=��	�G G>�D����>n�n>6����>������#>o�V=����b����>dI�=v�<�8�-)>����*�=�?)�ݼWT�>���>���>Ľo=gt�>��N�>��1��C�~k\>���,��=Ig�>[�>���=�Z�<��q=z�>�]�>�a}�o�8=8>?I-?P�>\�?u7�Bb>p�Y��{W�afK>�eӾ!?�+)?"?&�?BB��0^.�� �<Yv�>�1羛H=`WR��b�>�'�>vK>����/�`?����B ���?|վ�q%?�?=? N?���&�.^�0|2?���=q���V>�?��!?��>Zk>h&ϾN�>j���55�J�c>tc���>��?i{^>�`>���y�%
-�=�h�>�`��Mt�>[7?h?��?`��>���=#x.?u&8��(Ľy�>R����>�U�>*-�>]1�>&����-�9�?�>`�=���+?W�=�Ӽ���>�=�r��Ax�>g��Ĵ�����>�x�����>��>���>�T??�������[>�'�ƌѾ��?�56?���>}x-?��?Cn\���;��&�LT����>����	��>�R�>.�?J��>|�ʽ�;H��*=�	�Q{��ȵ�;�`�����>R��<���>m6�=�a$>�=>��n=�T���y>���>4�S���>)3��E\�>���3�=���<D鸾�.վ�X.�ʩҽ>�>��=�21�_�����%��l��g����N��Ȉ����=�<�7��Ѣx>Ax'>�'v>�)�>}^�>�վ�>$o�<���=U���GY�=�&L�2p�=��s�:x+�Vm��۪���-<E�;'j�>���#�+?t_#?����?9�)?��"?��?��*?j���'?܂
-�_S	�Y9,?;�����(?ћ3?�� ?f�"?PH�y.�>;�@>�4쾕���T~оsg?
����m���"?
�>�޶>Q���׾�q��U�?�վr*羺�?��澅�Ծ`?z�Ⱦ�۾t�?�Fݾ�z���?�E�>ٽ�>����=�wܾʃ?!d�>hx�>D���r�>Tb�>����i��w�ھ?C"�>�~�>I�������Z���%?{���J��U.?�8��O��}�#?��о@c¾Ց?#��>�j�>/��
+vector_observation����action_masks����policy_1/concat_2/concatactionaction_output_shape������?action_output_shapememory_sizeversion_numberis_continuous_controlpolicy_1/add_2/ypolicy_1/add/ypolicy_1/Sum/reduction_indices$policy/main_graph_0/hidden_0/BiasAdd�����?vector_observation#policy/main_graph_0/hidden_0/kernel�!policy/main_graph_0/hidden_0/bias� policy/main_graph_0/hidden_0/Mul2	�����?$policy/main_graph_0/hidden_0/BiasAddpolicy_1/dense/MatMul�����? policy/main_graph_0/hidden_0/Mulpolicy/dense/kernel�<policy_1/dense/MatMul/patch:0�policy_1/strided_slice������?action_maskspolicy_1/Softmax2�����?policy_1/dense/MatMulpolicy_1/addd�����?policy_1/Softmaxpolicy_1/add/ypolicy_1/Mulf�����?policy_1/addpolicy_1/strided_slicepolicy_1/Sum������?policy_1/Mulpolicy_1/Sum/reduction_indicespolicy_1/truedivg�����?policy_1/Mulpolicy_1/Sumpolicy_1/add_2d�����?policy_1/truedivpolicy_1/add_2/ypolicy_1/Log_12r�����?policy_1/add_2policy_1/concat_2/concat2�����?policy_1/Log_1action2�����?policy_1/concat_2/concat@@@���3���3�?������>C�{>�����>M��;D�o�X|6>�q*>n����l?낝>>d=�ߓ���u�ȹ�<��"��8��cw�>��l�ʉ�>���="�����>�#�<�=����Ka����>aN+c>�|Խn�T�/ּ�m�>4S�nvm>�a	�X쥽V_���l:=Y�:;=+P�*��.`k;����B>��UQv�t�=>�7)=��;;��Μ�
VW;�̵��瀽=���kl�>��={1�>Q6�=��K��w���D��$�=���>H��>>2��>K�>������>0��>!�C�&����.>��>�[�=��>���C1>x3�=��>��?=�%4��>�5�������Sý;T�>�k���?>|��=�I�;
+~l>�V���,���ꂽi�߾��>�Ⱥ�{�%>]g�>#^ >���������}�>���K%��K\�<���>v^�<�u(>v���]�^=E�߽9~�>�}�kE��\�'�no�>��>�
+���cɾs��<�/z=#?��8��>�Ǽ=�^�=o*@����>H-�>��N�	���:>����=�Ȗ>�mҽ٨h�2 �>\V�=�T���/�=@}�=��ǽ�-">�>#~���i=�^�=4��>������>QgϾM�
+���ʼ�렾�4>�=\T>l�վ�����=������� 2��پuA>J��= ��w+���_'>�y�>(�
?�	�>h�j>,~��~q`�%>�ʾv�n��;?�(�<h�ɾ0��a��<�:��*�=���#>^����4>.-�>T�>(�~>;�=���>ܼ�>�1?��?6�ɾ�/��=�>�>� �����=�����)���j=U6�F>>l��>A V?�϶��|?��g��6�>�>i�(?�C?���֒��>^%�>�)?Z&?GN.�P���a��>���}�.?D?]��>H�ܾ���>�U�o{?���㇡�ݳ�>1\�;�;���������=x�>���;�I��Z�>���.�>�s>��ٽT���թ	�"��>��>*)������<R>��@����>���=���>���>Ya��V_=^�5=k/ʽ'F�>A�(=4)L>�
��|�>wQ�<����w7?���>^K�=Z"����z�?��>��`?PM%?C��>�;��=3���N>-)��6�>�k�=�P�>O=P?	c�a|?J?��#?Z7�>��F�u��:�2>e�=2#�r8�>����ʊ�9D'���-<'��κ>��>�!_���H>������>ޤ�>5��>��4=FU��������=M�>��>U��=.��i����L>q7ν�أ>M��=4��>:n#�:�<ْh����>n���k��X%����<W�#�A
�<�R>9t�����)["��@�����>h�=��q�>�:z�]B
+=�±��`��c>�>Ln=<2��(W�=���>�g���2���X��v��������>홼�&ܾɡ�>�#�o����<>�DE=e,
�(�і���l��Z�´�4���%�����d��w%>aY����'>*�B��+0��v����e����6R�r�,��v">?S�>X͜>2�4?�:?4�??��#�J���3?Ĝ;?�}=?t??���v�g�<?�I�P+C?��C?p=?�4�yJ?n����<?9ݾ�꾙�(?��ܾ�]پ�S?�ھ����?8��>C5�>+��x��>U��>�M��DϾ����!�(?�#Ҿ(��R�?�ɾ<羚P?o�־�4���D2?���>+$�>�1����>�>S/�w+޾-���&,?���>�:�>��}׾����E�(?p]����ξ�j.?&�۾�ƾ؏!?���>���>?��/����~��[m�>
��>B��>�
�����[���b�-?
--- a/Project/Assets/ML-Agents/Examples/Bouncer/TFModels/Bouncer.nn
+++ b/Project/Assets/ML-Agents/Examples/Bouncer/TFModels/Bouncer.nn
--- a/Project/Assets/ML-Agents/Examples/Crawler/Prefabs/FixedPlatform.prefab
+++ b/Project/Assets/ML-Agents/Examples/Crawler/Prefabs/FixedPlatform.prefab
  m_InferenceDevice: 0
  m_BehaviorType: 0
  m_BehaviorName: CrawlerStatic
-  m_TeamID: 0
-  m_useChildSensors: 1
+  TeamId: 0
+  m_UseChildSensors: 1
 --- !u!114 &114230237520033992
 MonoBehaviour:
  m_ObjectHideFlags: 0
  m_Script: {fileID: 11500000, guid: 2f37c30a5e8d04117947188818902ef3, type: 3}
  m_Name: 
  m_EditorClassIdentifier: 
+  agentParameters:
+    maxStep: 0
+  hasUpgradedFromAgentParameters: 1
  maxStep: 5000
  target: {fileID: 4749909135913778}
  ground: {fileID: 4856650706546504}
  m_Name: 
  m_EditorClassIdentifier: 
  DecisionPeriod: 5
-  RepeatAction: 0
+  TakeActionsBetweenDecisions: 0
  offsetStep: 0
 --- !u!1 &1492926997393242
 GameObject:
  m_PrefabAsset: {fileID: 0}
  m_GameObject: {fileID: 1995322274649904}
  m_LocalRotation: {x: 0, y: -0, z: -0, w: 1}
-  m_LocalPosition: {x: -0, y: 0.5, z: 0}
-  m_LocalScale: {x: 0.01, y: 0.01, z: 0.01}
+  m_LocalPosition: {x: -0, y: 1.5, z: 0}
+  m_LocalScale: {x: 0.01, y: 0.03, z: 0.01}
  m_Children: []
  m_Father: {fileID: 4924174722017668}
  m_RootOrder: 1
--- a/Project/Assets/ML-Agents/Examples/Crawler/TFModels/CrawlerDynamic.nn
+++ b/Project/Assets/ML-Agents/Examples/Crawler/TFModels/CrawlerDynamic.nn
--- a/Project/Assets/ML-Agents/Examples/Crawler/TFModels/CrawlerStatic.nn
+++ b/Project/Assets/ML-Agents/Examples/Crawler/TFModels/CrawlerStatic.nn
--- a/Project/Assets/ML-Agents/Examples/FoodCollector/TFModels/FoodCollector.nn
+++ b/Project/Assets/ML-Agents/Examples/FoodCollector/TFModels/FoodCollector.nn
--- a/Project/Assets/ML-Agents/Examples/GridWorld/TFModels/GridWorld.nn
+++ b/Project/Assets/ML-Agents/Examples/GridWorld/TFModels/GridWorld.nn
--- a/Project/Assets/ML-Agents/Examples/Hallway/TFModels/Hallway.nn
+++ b/Project/Assets/ML-Agents/Examples/Hallway/TFModels/Hallway.nn
--- a/Project/Assets/ML-Agents/Examples/PushBlock/TFModels/PushBlock.nn
+++ b/Project/Assets/ML-Agents/Examples/PushBlock/TFModels/PushBlock.nn
--- a/Project/Assets/ML-Agents/Examples/Pyramids/TFModels/Pyramids.nn
+++ b/Project/Assets/ML-Agents/Examples/Pyramids/TFModels/Pyramids.nn
--- a/Project/Assets/ML-Agents/Examples/Reacher/TFModels/Reacher.nn
+++ b/Project/Assets/ML-Agents/Examples/Reacher/TFModels/Reacher.nn
--- a/Project/Assets/ML-Agents/Examples/Soccer/TFModels/Soccer.nn
+++ b/Project/Assets/ML-Agents/Examples/Soccer/TFModels/Soccer.nn
--- a/Project/Assets/ML-Agents/Examples/Tennis/TFModels/Tennis.nn
+++ b/Project/Assets/ML-Agents/Examples/Tennis/TFModels/Tennis.nn
--- a/Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker.nn
+++ b/Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker.nn
--- a/Project/Assets/ML-Agents/Examples/WallJump/TFModels/BigWallJump.nn
+++ b/Project/Assets/ML-Agents/Examples/WallJump/TFModels/BigWallJump.nn
--- a/Project/Assets/ML-Agents/Examples/WallJump/TFModels/SmallWallJump.nn
+++ b/Project/Assets/ML-Agents/Examples/WallJump/TFModels/SmallWallJump.nn
--- a/README.md
+++ b/README.md
 * Unity environment control from Python
 * 15+ sample Unity environments
 * Two deep reinforcement learning algorithms,
-[Proximal Policy Optimization](https://github.com/Unity-Technologies/ml-agents/tree/latest_release/docs/Training-PPO.md)
- (PPO) and [Soft Actor-Critic](https://github.com/Unity-Technologies/ml-agents/tree/latest_release/docs/Training-SAC.md)
+[Proximal Policy Optimization](docs/Training-PPO.md)
+ (PPO) and [Soft Actor-Critic](docs/Training-SAC.md)
-* Built-in support for [Imitation Learning](https://github.com/Unity-Technologies/ml-agents/tree/latest_release/docs/Training-Imitation-Learning.md) through Behavioral Cloning or Generative Adversarial Imitation Learning
+* Built-in support for [Imitation Learning](docs/Training-Imitation-Learning.md) through Behavioral Cloning or Generative Adversarial Imitation Learning
 * Flexible agent control with On Demand Decision Making
 * Visualizing network outputs within the environment
 * Wrap learning environments as a gym
 ## Releases & Documentation
-**Our latest, stable release is 0.14.1. Click
+**Our latest, stable release is 0.15.0. Click
+
 get started with the latest release of ML-Agents.**

 The table below lists all our releases, including our `master` branch which is under active

 | **Version** | **Release Date** | **Source** | **Documentation** | **Download** |
 |:-------:|:------:|:-------------:|:-------:|:------------:|
-| **master** (unstable) | -- | [source](https://github.com/Unity-Technologies/ml-agents/tree/master) |  [docs](https://github.com/Unity-Technologies/ml-agents/tree/master/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/master.zip) |
-| **0.14.1** (latest stable release) | February 26, 2020 | **[source](https://github.com/Unity-Technologies/ml-agents/tree/latest_release)** |  **[docs](https://github.com/Unity-Technologies/ml-agents/tree/latest_release/docs/Readme.md)** | **[download](https://github.com/Unity-Technologies/ml-agents/archive/latest_release.zip)** |
-| **0.14.0**  | February 13, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.14.0) |  [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.14.0/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.14.0.zip) |
-| **0.13.1**  | January 21, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.13.1) |  [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.13.1/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.13.1.zip) |
-| **0.13.0**  | January 8, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.13.0) |  [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.13.0/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.13.0.zip) |
-| **0.12.1**  | December 11, 2019 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.12.1) |  [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.12.1/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.12.1.zip) |
-| **0.12.0**  | December 2, 2019 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.12.0) |  [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.12.0/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.12.0.zip) |
-| **0.11.0**  | November 4, 2019 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.11.0) |  [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.11.0/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.11.0.zip) |
-| **0.10.1**  | October 9, 2019 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.10.1) |  [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.10.1/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.10.1.zip) |
-| **0.10.0**  | September 30, 2019 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.10.0) |  [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.10.0/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.10.0.zip) |
-
+| **master (unstable)** | -- | [source](https://github.com/Unity-Technologies/ml-agents/tree/master) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/master/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/master.zip) |
+| **0.15.0** | **March 18, 2020** | **[source](https://github.com/Unity-Technologies/ml-agents/tree/0.15.0)** | **[docs](https://github.com/Unity-Technologies/ml-agents/tree/0.15.0/docs/Readme.md)** | **[download](https://github.com/Unity-Technologies/ml-agents/archive/0.15.0.zip)** |
+| **0.14.1** | February 26, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.14.1) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.14.1/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.14.1.zip) |
+| **0.14.0** | February 13, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.14.0) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.14.0/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.14.0.zip) |
+| **0.13.1** | January 21, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.13.1) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.13.1/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.13.1.zip) |
+| **0.13.0** | January 8, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.13.0) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.13.0/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.13.0.zip) |
+| **0.12.1** | December 11, 2019 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.12.1) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.12.1/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.12.1.zip) |
+| **0.12.0** | December 2, 2019 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.12.0) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.12.0/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.12.0.zip) |
+| **0.11.0** | November 4, 2019 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.11.0) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.11.0/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.11.0.zip) |

 ## Citation

--- a/com.unity.ml-agents/CHANGELOG.md
+++ b/com.unity.ml-agents/CHANGELOG.md

 ### Minor Changes
 - Format of console output has changed slightly and now matches the name of the model/summary directory. (#3630, #3616)
+ - Raise the wall in CrawlerStatic scene to prevent Agent from falling off. (#3650)
+ - Renamed 'Generalization' feature to 'Environment Parameter Randomization'.

 ## [0.15.0-preview] - 2020-03-18
 ### Major Changes
--- a/docs/ML-Agents-Overview.md
+++ b/docs/ML-Agents-Overview.md
  learn more about adding visual observations to an agent
  [here](Learning-Environment-Design-Agents.md#multiple-visual-observations).

- **Training with Reset Parameter Sampling** - To train agents to be adapt
-  to changes in its environment (i.e., generalization), the agent should be exposed
-  to several variations of the environment. Similar to Curriculum Learning,
+- **Training with Environment Parameter Randomization** - If an agent is exposed to several variations of an environment, it will be more robust (i.e. generalize better) to
+  unseen variations of the environment. Similar to Curriculum Learning,
-  a way to randomly sample Reset Parameters of the environment during training. See
-  [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
+  a way to randomly sample parameters of the environment during training. See
+  [Training With Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md)
  to learn more about this feature.

 - **Cloud Training on AWS** - To facilitate using the ML-Agents toolkit on
--- a/docs/Readme.md
+++ b/docs/Readme.md
 * [Training with Curriculum Learning](Training-Curriculum-Learning.md)
 * [Training with Imitation Learning](Training-Imitation-Learning.md)
 * [Training with LSTM](Feature-Memory.md)
-* [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
+* [Training with Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md)

 ## Inference

--- a/docs/Training-Curriculum-Learning.md
+++ b/docs/Training-Curriculum-Learning.md
  measure by previous values.
  * If `true`, weighting will be 0.75 (new) 0.25 (old).
 * `parameters` (dictionary of key:string, value:float array) - Corresponds to
-  Academy reset parameters to control. Length of each array should be one
+  Environment parameters to control. Length of each array should be one
-Once our curriculum is defined, we have to use the reset parameters we defined
+Once our curriculum is defined, we have to use the environment parameters we defined
 and modify the environment from the Agent's `OnEpisodeBegin()` function. See
 [WallJumpAgent.cs](https://github.com/Unity-Technologies/ml-agents/blob/master/Project/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs)
 for an example.
--- a/docs/Training-ML-Agents.md
+++ b/docs/Training-ML-Agents.md
  lessons for curriculum training. See [Curriculum
  Training](Training-Curriculum-Learning.md) for more information.
 * `--sampler=<file>`: Specify a sampler YAML file for defining the
-  sampler for generalization training. See [Generalization
-  Training](Training-Generalized-Reinforcement-Learning-Agents.md) for more information.
+  sampler for parameter randomization. See [Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md) for more information.
 * `--keep-checkpoints=<n>`: Specify the maximum number of model checkpoints to
  keep. Checkpoints are saved after the number of steps specified by the
  `save-freq` option. Once the maximum number of checkpoints has been reached,
 * [Using Recurrent Neural Networks](Feature-Memory.md)
 * [Training with Curriculum Learning](Training-Curriculum-Learning.md)
 * [Training with Imitation Learning](Training-Imitation-Learning.md)
-* [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
+* [Training with Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md)

 You can also compare the
 [example environments](Learning-Environment-Examples.md)
--- a/docs/Training-Self-Play.md
+++ b/docs/Training-Self-Play.md

 ![Team ID](images/team_id.png)

-***Team ID must be 0 or an integer greater than 0. Negative numbers will cause unpredictable behavior.*** See the trainer configuration and agent prefabs for our Tennis environment for an example.
+***Team ID must be 0 or an integer greater than 0. Negative numbers will cause unpredictable behavior.***
+
+See the trainer configuration and agent prefabs for our Tennis environment for an example.

 ## Best Practices Training with Self-Play

--- a/ml-agents/mlagents/trainers/behavior_id_utils.py
+++ b/ml-agents/mlagents/trainers/behavior_id_utils.py
 from typing import NamedTuple
+from urllib.parse import urlparse, parse_qs
+    """
+    BehaviorIdentifiers is a named tuple if the identifiers that uniquely distinguish
+    an agent encountered in the trainer_controller. The named tuple consists of the
+    fully qualified behavior name, the name of the brain name (corresponds to trainer
+    in the trainer controller) and the team id.  In the future, this can be extended
+    to support further identifiers.
+    """
+
    behavior_id: str
    brain_name: str
    team_id: int
        """
-        Parses a name_behavior_id of the form name?team=0&param1=i&...
+        Parses a name_behavior_id of the form name?team=0
-        This allows you to access the brain name and distinguishing identifiers
-        without parsing more than once.
+        This allows you to access the brain name and team id og an agent
+        parsed = urlparse(name_behavior_id)
+        name = parsed.path
+        ids = parse_qs(parsed.query)
-        if "?" in name_behavior_id:
-            name, team_and_id = name_behavior_id.rsplit("?", 1)
-            _, team_id_str = team_and_id.split("=")
-            team_id = int(team_id_str)
-        else:
-            name = name_behavior_id
-
+        if "team" in ids:
+            team_id = int(ids["team"][0])
        return BehaviorIdentifiers(
            behavior_id=name_behavior_id, brain_name=name, team_id=team_id
        )
--- a/ml-agents/mlagents/trainers/ghost/controller.py
+++ b/ml-agents/mlagents/trainers/ghost/controller.py


 class GhostController(object):
+    """
+    GhostController contains a queue of team ids. GhostTrainers subscribe to the GhostController and query
+    it to get the current learning team.  The GhostController cycles through team ids every 'swap_interval'
+    which corresponds to the number of trainer steps between changing learning teams.
+    """
+
+        """
+        Create a GhostController.
+        :param swap_interval: Number of trainer steps between changing learning teams.
+        :param maxlen: Maximum number of GhostTrainers allowed in this GhostController
+        """
+
+        # Dict from team id to GhostTrainer
+        """
+        Given a team_id and trainer, add to queue and trainers if not already.
+        The GhostTrainer is used later by the controller to get ELO ratings of agents.
+        :param team_id: The team_id of an agent managed by this GhostTrainer
+        :param trainer: A GhostTrainer that manages this team_id.
+        """
        if team_id not in self._ghost_trainers:
            self._ghost_trainers[team_id] = trainer
            if self._learning_team < 0:

    def get_learning_team(self, step: int) -> int:
+        """
+        Returns the current learning team. If 'swap_interval' steps have elapsed, the current
+        learning team is added to the end of the queue and then updated with the next in line.
+        :param step: Current step of the trainer.
+        :return: The learning team id
+        """
        if step >= self._swap_interval + self._last_swap:
            self._last_swap = step
            self._queue.append(self._learning_team)
    # Adapted from https://github.com/Unity-Technologies/ml-agents/pull/1975 and
    # https://metinmediamath.wordpress.com/2013/11/27/how-to-calculate-the-elo-rating-including-example/
    # ELO calculation
-
+    # TODO : Generalize this to more than two teams
+        """
+        Calculates ELO. Given the rating of the learning team and result.  The GhostController
+        queries the other GhostTrainers for the ELO of their agent that is currently being deployed.
+        Note, this could be the current agent or a past snapshot.
+        :param rating: Rating of the learning team.
+        :param result: Win, loss, or draw from the perspective of the learning team.
+        :return: The change in ELO.
+        """
        opponent_rating: float = 0.0
        for team_id, trainer in self._ghost_trainers.items():
            if team_id != self._learning_team:
--- a/ml-agents/mlagents/trainers/learn.py
+++ b/ml-agents/mlagents/trainers/learn.py
    )

    argparser.add_argument(
-        "--ghost-swap",
+        "--team-change",
-        help="Number of trainer steps between swapping behavior id being ghosted",
+        help="Number of trainer steps between changing the team_id that is learning",
    )

    argparser.add_argument(
    keep_checkpoints: int = parser.get_default("keep_checkpoints")
    base_port: int = parser.get_default("base_port")
    num_envs: int = parser.get_default("num_envs")
-    ghost_swap: int = parser.get_default("ghost_swap")
+    team_change: int = parser.get_default("team_change")
    curriculum_config: Optional[Dict] = None
    lesson: int = parser.get_default("lesson")
    no_graphics: bool = parser.get_default("no_graphics")
            options.keep_checkpoints,
            options.train_model,
            options.load_model,
-            options.ghost_swap,
+            options.team_change,
            run_seed,
            maybe_meta_curriculum,
            options.multi_gpu,
--- a/ml-agents/mlagents/trainers/tests/test_subprocess_env_manager.py
+++ b/ml-agents/mlagents/trainers/tests/test_subprocess_env_manager.py
    # check the StatsReporter's debug stat writer's last reward.
    assert isinstance(StatsReporter.writers[0], DebugWriter)
    assert all(
-        val > 0.99 for val in StatsReporter.writers[0].get_last_rewards().values()
+        val > 0.7 for val in StatsReporter.writers[0].get_last_rewards().values()
    )
    env_manager.close()
--- a/ml-agents/mlagents/trainers/trainer_util.py
+++ b/ml-agents/mlagents/trainers/trainer_util.py
        keep_checkpoints: int,
        train_model: bool,
        load_model: bool,
-        ghost_swap: int,
+        team_change: int,
        seed: int,
        meta_curriculum: MetaCurriculum = None,
        multi_gpu: bool = False,
        self.seed = seed
        self.meta_curriculum = meta_curriculum
        self.multi_gpu = multi_gpu
-        self.ghost_controller = GhostController(ghost_swap)
+        self.ghost_controller = GhostController(team_change)

    def generate(self, brain_name: str) -> Trainer:
        return initialize_trainer(
    :param keep_checkpoints: How many model checkpoints to keep
    :param train_model: Whether to train the model (vs. run inference)
    :param load_model: Whether to load the model or randomly initialize
+    :param ghost_controller: The object that coordinates ghost trainers
    :param seed: The random seed to use
    :param meta_curriculum: Optional meta_curriculum, used to determine a reward buffer length for PPOTrainer
    :return:
--- a/utils/make_readme_table.py
+++ b/utils/make_readme_table.py
 into the markdown file.
 """
 from distutils.version import LooseVersion
+from datetime import datetime
-def table_line(version):
-    return f"| **{version}**  | [source](https://github.com/Unity-Technologies/ml-agents/tree/{version}) |  [docs](https://github.com/Unity-Technologies/ml-agents/tree/{version}/docs) | [download](https://github.com/Unity-Technologies/ml-agents/archive/{version}.zip) |"  # noqa
+def table_line(display_name, name, date, bold=False):
+    bold_str = "**" if bold else ""
+    return f"| **{display_name}** | {bold_str}{date}{bold_str} | {bold_str}[source](https://github.com/Unity-Technologies/ml-agents/tree/{name}){bold_str} | {bold_str}[docs](https://github.com/Unity-Technologies/ml-agents/tree/{name}/docs/Readme.md){bold_str} | {bold_str}[download](https://github.com/Unity-Technologies/ml-agents/archive/{name}.zip){bold_str} |"  # noqa
-    "0.10.0",
-    "0.10.1",
-    "0.11.0",
-    "0.12.0",
-    "0.12.1",
-    "0.13.0",
-    "0.13.1",
-    "0.14.0",
+    ["0.10.0", "September 30, 2019"],
+    ["0.10.1", "October 9, 2019"],
+    ["0.11.0", "November 4, 2019"],
+    ["0.12.0", "December 2, 2019"],
+    ["0.12.1", "December 11, 2019"],
+    ["0.13.0", "January 8, 2020"],
+    ["0.13.1", "January 21, 2020"],
+    ["0.14.0", "February 13, 2020"],
+    ["0.14.1", "February 26, 2020"],
+    ["0.15.0", "March 18, 2020"],
-sorted_versions = sorted((LooseVersion(v) for v in versions), reverse=True)
+MAX_DAYS = 150  # do not print releases older than this many days
+sorted_versions = sorted(
+    ([LooseVersion(v[0]), v[1]] for v in versions), key=lambda x: x[0], reverse=True
+)
-for v in sorted_versions:
-    print(table_line(str(v)))
+print(table_line("master (unstable)", "master", "--"))
+highlight = True  # whether to bold the line or not
+for version_name, version_date in sorted_versions:
+    elapsed_days = (
+        datetime.today() - datetime.strptime(version_date, "%B %d, %Y")
+    ).days
+    if elapsed_days <= MAX_DAYS:
+        print(table_line(version_name, version_name, version_date, highlight))
+        highlight = False  # only bold the first stable release
--- a/docs/Training-Environment-Parameter-Randomization.md
+++ b/docs/Training-Environment-Parameter-Randomization.md
+# Training With Environment Parameter Randomization
+
+One of the challenges of training and testing agents on the same
+environment is that the agents tend to overfit. The result is that the
+agents are unable to generalize to any tweaks or variations in the environment.
+This is analogous to a model being trained and tested on an identical dataset
+in supervised learning. This becomes problematic in cases where environments
+are instantiated with varying objects or properties.
+
+To help agents robust and better generalizable to changes in the environment, the agent
+can be trained over multiple variations of a given environment. We refer to this approach as **Environment Parameter Randomization**. For those familiar with Reinforcement Learning research, this approach is based on the concept of Domain Randomization (you can read more about it [here](https://arxiv.org/abs/1703.06907)). By using parameter randomization
+during training, the agent can be better suited to adapt (with higher performance)
+to future unseen variations of the environment.
+
+_Example of variations of the 3D Ball environment._
+
+Ball scale of 0.5          |  Ball scale of 4
+:-------------------------:|:-------------------------:
+![](images/3dball_small.png)  |  ![](images/3dball_big.png)
+
+
+To enable variations in the environments, we implemented `Environment Parameters`.
+`Environment Parameters` are `Academy.Instance.FloatProperties` that can be read when setting
+up the environment. We
+also included different sampling methods and the ability to create new kinds of
+sampling methods for each `Environment Parameter`. In the 3D ball environment example displayed
+in the figure above, the environment parameters are `gravity`, `ball_mass` and `ball_scale`.
+
+
+## How to Enable Environment Parameter Randomization
+
+We first need to provide a way to modify the environment by supplying a set of `Environment Parameters`
+and vary them over time. This provision can be done either deterministically or randomly.
+
+This is done by assigning each `Environment Parameter` a `sampler-type`(such as a uniform sampler),
+which determines how to sample an `Environment
+Parameter`. If a `sampler-type` isn't provided for a
+`Environment Parameter`, the parameter maintains the default value throughout the
+training procedure, remaining unchanged. The samplers for all the `Environment Parameters`
+are handled by a **Sampler Manager**, which also handles the generation of new
+values for the environment parameters when needed.
+
+To setup the Sampler Manager, we create a YAML file that specifies how we wish to
+generate new samples for each `Environment Parameters`. In this file, we specify the samplers and the
+`resampling-interval` (the number of simulation steps after which environment parameters are
+resampled). Below is an example of a sampler file for the 3D ball environment.
+
+```yaml
+resampling-interval: 5000
+
+mass:
+    sampler-type: "uniform"
+    min_value: 0.5
+    max_value: 10
+
+gravity:
+    sampler-type: "multirange_uniform"
+    intervals: [[7, 10], [15, 20]]
+
+scale:
+    sampler-type: "uniform"
+    min_value: 0.75
+    max_value: 3
+
+```
+
+Below is the explanation of the fields in the above example.
+
+* `resampling-interval` - Specifies the number of steps for the agent to
+train under a particular environment configuration before resetting the
+environment with a new sample of `Environment Parameters`.
+
+* `Environment Parameter` - Name of the `Environment Parameter` like `mass`, `gravity` and `scale`. This should match the name
+specified in the `FloatProperties` of the environment being trained. If a parameter specified in the file doesn't exist in the
+environment, then this parameter will be ignored.  Within each `Environment Parameter`
+
+    * `sampler-type` - Specify the sampler type to use for the `Environment Parameter`.
+    This is a string that should exist in the `Sampler Factory` (explained
+    below).
+
+    * `sampler-type-sub-arguments` - Specify the sub-arguments depending on the `sampler-type`.
+    In the example above, this would correspond to the `intervals`
+    under the `sampler-type` `"multirange_uniform"` for the `Environment Parameter` called `gravity`.
+    The key name should match the name of the corresponding argument in the sampler definition.
+    (See below)
+
+The Sampler Manager allocates a sampler type for each `Environment Parameter` by using the *Sampler Factory*,
+which maintains a dictionary mapping of string keys to sampler objects. The available sampler types
+to be used for each `Environment Parameter` is available in the Sampler Factory.
+
+### Included Sampler Types
+
+Below is a list of included `sampler-type` as part of the toolkit.
+
+* `uniform` - Uniform sampler
+    *   Uniformly samples a single float value between defined endpoints.
+        The sub-arguments for this sampler to specify the interval
+        endpoints are as below. The sampling is done in the range of
+        [`min_value`, `max_value`).
+
+    * **sub-arguments** - `min_value`, `max_value`
+
+* `gaussian` - Gaussian sampler
+    *   Samples a single float value from the distribution characterized by
+        the mean and standard deviation. The sub-arguments to specify the
+        gaussian distribution to use are as below.
+
+    * **sub-arguments** - `mean`, `st_dev`
+
+* `multirange_uniform` - Multirange uniform sampler
+    *   Uniformly samples a single float value between the specified intervals.
+        Samples by first performing a weight pick of an interval from the list
+        of intervals (weighted based on interval width) and samples uniformly
+        from the selected interval (half-closed interval, same as the uniform
+        sampler). This sampler can take an arbitrary number of intervals in a
+        list in the following format:
+    [[`interval_1_min`, `interval_1_max`], [`interval_2_min`, `interval_2_max`], ...]
+
+    * **sub-arguments** - `intervals`
+
+The implementation of the samplers can be found at `ml-agents-envs/mlagents_envs/sampler_class.py`.
+
+### Defining a New Sampler Type
+
+If you want to define your own sampler type, you must first inherit the *Sampler*
+base class (included in the `sampler_class` file) and preserve the interface.
+Once the class for the required method is specified, it must be registered in the Sampler Factory.
+
+This can be done by subscribing to the *register_sampler* method of the SamplerFactory. The command
+is as follows:
+
+`SamplerFactory.register_sampler(*custom_sampler_string_key*, *custom_sampler_object*)`
+
+Once the Sampler Factory reflects the new register, the new sampler type can be used for sample any
+`Environment Parameter`. For example, lets say a new sampler type was implemented as below and we register
+the `CustomSampler` class with the string `custom-sampler` in the Sampler Factory.
+
+```python
+class CustomSampler(Sampler):
+
+    def __init__(self, argA, argB, argC):
+        self.possible_vals = [argA, argB, argC]
+
+    def sample_all(self):
+        return np.random.choice(self.possible_vals)
+```
+
+Now we need to specify the new sampler type in the sampler YAML file. For example, we use this new
+sampler type for the `Environment Parameter` *mass*.
+
+```yaml
+mass:
+    sampler-type: "custom-sampler"
+    argB: 1
+    argA: 2
+    argC: 3
+```
+
+### Training with Environment Parameter Randomization
+
+After the sampler YAML file is defined, we proceed by launching `mlagents-learn` and specify
+our configured sampler file with the `--sampler` flag. For example, if we wanted to train the
+3D ball agent with parameter randomization using `Environment Parameters` with `config/3dball_randomize.yaml`
+sampling setup, we would run
+
+```sh
+mlagents-learn config/trainer_config.yaml --sampler=config/3dball_randomize.yaml
+--run-id=3D-Ball-randomize --train
+```
+
+We can observe progress and metrics via Tensorboard.
--- a/docs/Training-Generalized-Reinforcement-Learning-Agents.md
+++ b/docs/Training-Generalized-Reinforcement-Learning-Agents.md
-# Training Generalized Reinforcement Learning Agents
-
-One of the challenges of training and testing agents on the same
-environment is that the agents tend to overfit. The result is that the
-agents are unable to generalize to any tweaks or variations in the environment.
-This is analogous to a model being trained and tested on an identical dataset
-in supervised learning. This becomes problematic in cases where environments
-are randomly instantiated with varying objects or properties.
-
-To make agents robust and generalizable to different environments, the agent
-should be trained over multiple variations of the environment. Using this approach
-for training, the agent will be better suited to adapt (with higher performance)
-to future unseen variations of the environment
-
-_Example of variations of the 3D Ball environment._
-
-Ball scale of 0.5          |  Ball scale of 4
-:-------------------------:|:-------------------------:
-![](images/3dball_small.png)  |  ![](images/3dball_big.png)
-
-## Introducing Generalization Using Reset Parameters
-
-To enable variations in the environments, we implemented `Reset Parameters`.
-`Reset Parameters` are `Academy.Instance.FloatProperties` that are used only when
-resetting the environment. We
-also included different sampling methods and the ability to create new kinds of
-sampling methods for each `Reset Parameter`. In the 3D ball environment example displayed
-in the figure above, the reset parameters are `gravity`, `ball_mass` and `ball_scale`.
-
-
-## How to Enable Generalization Using Reset Parameters
-
-We first need to provide a way to modify the environment by supplying a set of `Reset Parameters`
-and vary them over time. This provision can be done either deterministically or randomly.
-
-This is done by assigning each `Reset Parameter` a `sampler-type`(such as a uniform sampler),
-which determines how to sample a `Reset
-Parameter`. If a `sampler-type` isn't provided for a
-`Reset Parameter`, the parameter maintains the default value throughout the
-training procedure, remaining unchanged. The samplers for all the `Reset Parameters`
-are handled by a **Sampler Manager**, which also handles the generation of new
-values for the reset parameters when needed.
-
-To setup the Sampler Manager, we create a YAML file that specifies how we wish to
-generate new samples for each `Reset Parameters`. In this file, we specify the samplers and the
-`resampling-interval` (the number of simulation steps after which reset parameters are
-resampled). Below is an example of a sampler file for the 3D ball environment.
-
-```yaml
-resampling-interval: 5000
-
-mass:
-    sampler-type: "uniform"
-    min_value: 0.5
-    max_value: 10
-
-gravity:
-    sampler-type: "multirange_uniform"
-    intervals: [[7, 10], [15, 20]]
-
-scale:
-    sampler-type: "uniform"
-    min_value: 0.75
-    max_value: 3
-
-```
-
-Below is the explanation of the fields in the above example.
-
-* `resampling-interval` - Specifies the number of steps for the agent to
-train under a particular environment configuration before resetting the
-environment with a new sample of `Reset Parameters`.
-
-* `Reset Parameter` - Name of the `Reset Parameter` like `mass`, `gravity` and `scale`. This should match the name
-specified in the academy of the intended environment for which the agent is
-being trained. If a parameter specified in the file doesn't exist in the
-environment, then this parameter will be ignored.  Within each `Reset Parameter`
-
-    * `sampler-type` - Specify the sampler type to use for the `Reset Parameter`.
-    This is a string that should exist in the `Sampler Factory` (explained
-    below).
-
-    * `sampler-type-sub-arguments` - Specify the sub-arguments depending on the `sampler-type`.
-    In the example above, this would correspond to the `intervals`
-    under the `sampler-type` `"multirange_uniform"` for the `Reset Parameter` called `gravity`.
-    The key name should match the name of the corresponding argument in the sampler definition.
-    (See below)
-
-The Sampler Manager allocates a sampler type for each `Reset Parameter` by using the *Sampler Factory*,
-which maintains a dictionary mapping of string keys to sampler objects. The available sampler types
-to be used for each `Reset Parameter` is available in the Sampler Factory.
-
-### Included Sampler Types
-
-Below is a list of included `sampler-type` as part of the toolkit.
-
-* `uniform` - Uniform sampler
-    *   Uniformly samples a single float value between defined endpoints.
-        The sub-arguments for this sampler to specify the interval
-        endpoints are as below. The sampling is done in the range of
-        [`min_value`, `max_value`).
-
-    * **sub-arguments** - `min_value`, `max_value`
-
-* `gaussian` - Gaussian sampler
-    *   Samples a single float value from the distribution characterized by
-        the mean and standard deviation. The sub-arguments to specify the
-        gaussian distribution to use are as below.
-
-    * **sub-arguments** - `mean`, `st_dev`
-
-* `multirange_uniform` - Multirange uniform sampler
-    *   Uniformly samples a single float value between the specified intervals.
-        Samples by first performing a weight pick of an interval from the list
-        of intervals (weighted based on interval width) and samples uniformly
-        from the selected interval (half-closed interval, same as the uniform
-        sampler). This sampler can take an arbitrary number of intervals in a
-        list in the following format:
-    [[`interval_1_min`, `interval_1_max`], [`interval_2_min`, `interval_2_max`], ...]
-
-    * **sub-arguments** - `intervals`
-
-The implementation of the samplers can be found at `ml-agents-envs/mlagents_envs/sampler_class.py`.
-
-### Defining a New Sampler Type
-
-If you want to define your own sampler type, you must first inherit the *Sampler*
-base class (included in the `sampler_class` file) and preserve the interface.
-Once the class for the required method is specified, it must be registered in the Sampler Factory.
-
-This can be done by subscribing to the *register_sampler* method of the SamplerFactory. The command
-is as follows:
-
-`SamplerFactory.register_sampler(*custom_sampler_string_key*, *custom_sampler_object*)`
-
-Once the Sampler Factory reflects the new register, the new sampler type can be used for sample any
-`Reset Parameter`. For example, lets say a new sampler type was implemented as below and we register
-the `CustomSampler` class with the string `custom-sampler` in the Sampler Factory.
-
-```python
-class CustomSampler(Sampler):
-
-    def __init__(self, argA, argB, argC):
-        self.possible_vals = [argA, argB, argC]
-
-    def sample_all(self):
-        return np.random.choice(self.possible_vals)
-```
-
-Now we need to specify the new sampler type in the sampler YAML file. For example, we use this new
-sampler type for the `Reset Parameter` *mass*.
-
-```yaml
-mass:
-    sampler-type: "custom-sampler"
-    argB: 1
-    argA: 2
-    argC: 3
-```
-
-### Training with Generalization Using Reset Parameters
-
-After the sampler YAML file is defined, we proceed by launching `mlagents-learn` and specify
-our configured sampler file with the `--sampler` flag. For example, if we wanted to train the
-3D ball agent with generalization using `Reset Parameters` with `config/3dball_generalize.yaml`
-sampling setup, we would run
-
-```sh
-mlagents-learn config/trainer_config.yaml --sampler=config/3dball_generalize.yaml
--run-id=3D-Ball-generalization --train
-```
-
-We can observe progress and metrics via Tensorboard.
--- a//config/3dball_generalize.yaml
+++ b//config/3dball_generalize.yaml