浏览代码

Merge pull request #1932 from Unity-Technologies/release-v0.8

Release v0.8
/develop-generalizationTraining-TrainerController
GitHub 6 年前
当前提交
ba57eaad
共有 39 个文件被更改,包括 7089 次插入6441 次删除
  1. 2
      README.md
  2. 1001
      UnitySDK/Assets/ML-Agents/Examples/3DBall/TFModels/3DBallHardLearning.nn
  3. 1001
      UnitySDK/Assets/ML-Agents/Examples/3DBall/TFModels/3DBallLearning.nn
  4. 622
      UnitySDK/Assets/ML-Agents/Examples/BananaCollectors/TFModels/BananaLearning.nn
  5. 22
      UnitySDK/Assets/ML-Agents/Examples/Basic/TFModels/BasicLearning.nn
  6. 272
      UnitySDK/Assets/ML-Agents/Examples/Bouncer/TFModels/BouncerLearning.nn
  7. 1001
      UnitySDK/Assets/ML-Agents/Examples/Crawler/TFModels/CrawlerDynamicLearning.nn
  8. 1001
      UnitySDK/Assets/ML-Agents/Examples/Crawler/TFModels/CrawlerStaticLearning.nn
  9. 997
      UnitySDK/Assets/ML-Agents/Examples/GridWorld/TFModels/GridWorldLearning.nn
  10. 999
      UnitySDK/Assets/ML-Agents/Examples/Hallway/TFModels/HallwayLearning.nn
  11. 1000
      UnitySDK/Assets/ML-Agents/Examples/PushBlock/TFModels/PushBlockLearning.nn
  12. 1000
      UnitySDK/Assets/ML-Agents/Examples/Pyramids/TFModels/PyramidsLearning.nn
  13. 1001
      UnitySDK/Assets/ML-Agents/Examples/Reacher/TFModels/ReacherLearning.nn
  14. 1001
      UnitySDK/Assets/ML-Agents/Examples/Tennis/TFModels/TennisLearning.nn
  15. 1001
      UnitySDK/Assets/ML-Agents/Examples/WallJump/TFModels/BigWallJumpLearning.nn
  16. 1001
      UnitySDK/Assets/ML-Agents/Examples/WallJump/TFModels/SmallWallJumpLearning.nn
  17. 70
      UnitySDK/Assets/ML-Agents/Scripts/Academy.cs
  18. 4
      docs/Installation.md
  19. 2
      docs/Learning-Environment-Examples.md
  20. 34
      docs/Migrating.md
  21. 2
      docs/Readme.md
  22. 8
      docs/Training-ML-Agents.md
  23. 4
      gym-unity/setup.py
  24. 13
      ml-agents-envs/mlagents/envs/environment.py
  25. 2
      ml-agents-envs/mlagents/envs/mock_communicator.py
  26. 29
      ml-agents-envs/mlagents/envs/subprocess_environment.py
  27. 6
      ml-agents-envs/setup.py
  28. 26
      ml-agents/mlagents/trainers/learn.py
  29. 7
      ml-agents/mlagents/trainers/tests/test_learn.py
  30. 5
      ml-agents/mlagents/trainers/tests/test_trainer_controller.py
  31. 28
      ml-agents/mlagents/trainers/trainer_controller.py
  32. 4
      ml-agents/setup.py
  33. 168
      docs/Creating-Custom-Protobuf-Messages.md
  34. 25
      docs/Training-Using-Concurrent-Unity-Instances.md
  35. 4
      docs/Using-TensorFlow-Sharp-in-Unity.md
  36. 167
      docs/Custom-Protos.md
  37. 0
      /ml-agents/mlagents/trainers/tests/test_trainer_metrics.py
  38. 0
      /ml-agents-envs/mlagents/envs/tests/test_subprocess_unity_environment.py

2
README.md


* Visualizing network outputs within the environment
* Simplified set-up with Docker
* Wrap learning environments as a gym
* Utilizes the Unity Inference Engine
* Train using concurrent Unity environment instances
## Documentation

1001
UnitySDK/Assets/ML-Agents/Examples/3DBall/TFModels/3DBallHardLearning.nn
文件差异内容过多而无法显示
查看文件

1001
UnitySDK/Assets/ML-Agents/Examples/3DBall/TFModels/3DBallLearning.nn
文件差异内容过多而无法显示
查看文件

622
UnitySDK/Assets/ML-Agents/Examples/BananaCollectors/TFModels/BananaLearning.nn
文件差异内容过多而无法显示
查看文件

22
UnitySDK/Assets/ML-Agents/Examples/Basic/TFModels/BasicLearning.nn


vector_observation���� action_masks���� action_probsconcat_1actionvalue_estimatestrided_slice_1/stack_2������?strided_slice_1/stack_2strided_slice_1/stack_1������?strided_slice_1/stack_1strided_slice_1/stack������?strided_slice_1/stackstrided_slice/stack_2������?strided_slice/stack_2strided_slice/stack_1������?strided_slice/stack_1strided_slice/stack������?strided_slice/stack
action_output_shape������?action_output_shape  memory_size������? memory_size version_number������?version_numberis_continuous_control������?is_continuous_controlSum/reduction_indices������?Sum/reduction_indicesadd_1/y������?add_1/yadd/y������?add/ymain_graph_0/hidden_0/BiasAdd�����?vector_observationmain_graph_0/hidden_0/kernel�main_graph_0/hidden_0/bias�main_graph_0/hidden_0/Mul2 �����?main_graph_0/hidden_0/BiasAdd dense/MatMul�����?main_graph_0/hidden_0/Mul dense/kernel�<dense/MatMul/patch:0� action_probs2�����? dense/MatMulSoftmax2�����? action_probsaddd�����?Softmaxadd/yMulf�����?add action_masksSum������?MulSum/reduction_indicestruedivg�����?MulSumadd_1d�����?truedivadd_1/yLog_12r�����?add_1concat_12�����?Log_1action2�����?Log_1dense_1/BiasAdd�����?main_graph_0/hidden_0/Muldense_1/kernel� dense_1/bias
value_estimate2�����?dense_1/BiasAdd�?�?@@�?�?@@@@@�?���.���. �>>)L=(�V��j���Q�� �>�G߽1�p���Ѽ��/�H:%�Ţt�o�轏�ۆ�>�| �Os"�RiQ�Aɾ~m�=��������)�=A�]>�����_��씅�7�>S��<G����9>z|�=�J>>�i8>Dy�=��>��(�����k�,>V��>��������>򖿾�Ej>��8�� �=*ˈ���=W�J<d���n��>��_�f8=:wZ�A.����<��%����=�ī��n>Ћ��tz>��E>K�����s>�����襾���=M'�=���7UU�yq��P!��uj�>���3��>]E��K�>�� >������>��޾�ֽ�>�<n�8=`{�媫�gM���8� ��<ȁ������W(>���!$#>�R>(\u��EȽ>C��/��� �0��>c�k;�я���F>羒�Ѽ�Y�����=H�A=A��<>m���m>�>"��p����<�X�p�b�J��>r >u��>����[ 6>�2>>B#���Y����<>X�q��=dJ��|�=�����Y>���yŶ<=���i7�t��>��Ⱦ���>�͚>%����1�<��<f��=Pl>��$=�}�n = H�v\&��!�>�L���`>X���|�>+�����>��b=M�\>B|������1�;[��=I� ����99]�>\��>e��> ������Fy���G�2��> ]��5A������n ?�/n�V�����=�v�>S��>�Y���4u>zo�����>�>Yf >|�J>�f�� qk��}����R���=�(�>X���9�U�٧�����>?km >��X���Z���=�:V�-f�>a�I>��>�=}>��q>��Ͼ�!��pW�>�%?a"�>#B$=_�ȼ ���6�W��b�>�{#?�M?l<v�27u�+p?�l?q��>8t�>o��>� ?��+��H���O ?M2?� ?���=�3��#�1���N>��A?r?rƁ>yj���+a>�x"?H�>�wn>�=~O[>h��C�� �߾o� ?�Z�>U�>�\�>�`��&�¾��Ⱦ�^(?J&0>
��>ܢ�K�=:L�=Ui?M�=A�>%��>�1\>o�J�{(��i8�=ȟ+?&'�>0u>쐷��$1�p�>a� ?��?
�>�W,�\�?���>ʉ?&�X��2>�2+>6��=�Xr�g��YX>�.�>�i�>�"?q�R����������#?��>�7?A ž���>�?�� ?���> DZ>aM�>m��>����Ğe=T��=��>a@%>��3?F���fy��� G��
> ��>���>����\�=6B?�S4>���>~<?�?��+��Pؾ�P��J�>�?@p�><��>ar�����u�>����< I�����Q�a>h|����;\���gJ���d=y�\>{߾���>�ң�c4J�;�پ�U�~�=#�%�+�Խ�KȾ�+��������>��V�+#辘Y�>iV����=W�a>�ѽ� �������=���>�ɶ>�3s<�9 >߄w=L�*>y_���H��G
�>/S�>�8>�V�=ná>��=&�[>��)>\nQ���>U���ꈾ��H>F��>m�>=̛��� ۾Z�+?��*?{�?��}�?�?��?8c?�?�P?��?�(�|��3#?�
(?�*/?[&?����� �mw~>��)>�֤��mϾ������?����ǽ���!&?�S޾l���� ? �>s��>+�����þ��¾jF?��¾]g㾻l?r����LܾB�?�)��������>+�ʾ��ѾH�?�P׾����n"?�⌾
N����>D��>��>ò ����>*��>�����xȾg/ھ �?����gN��^?��޾��پE+?�þ�
ؾ��
?�E�>���>�e� 3�>8r�>��
9S��.�I>D�!�!�������9W���>�>�>3�л����>���>�˩��}���a�y�ؾJd=��K�?�ɾ=�=F*�=
action_output_shape������?action_output_shape  memory_size������? memory_size version_number������?version_numberis_continuous_control������?is_continuous_controladd_1/y������?add_1/ySum/reduction_indices������?Sum/reduction_indicesadd/y������?add/ymain_graph_0/hidden_0/BiasAdd�����?vector_observationmain_graph_0/hidden_0/kernel�main_graph_0/hidden_0/bias�main_graph_0/hidden_0/Mul2 �����?main_graph_0/hidden_0/BiasAdd dense/MatMul�����?main_graph_0/hidden_0/Mul dense/kernel�<dense/MatMul/patch:0� action_probs2�����? dense/MatMulSoftmax2�����? action_probsaddd�����?Softmaxadd/yMulf�����?add action_masksSum������?MulSum/reduction_indicestruedivg�����?MulSumadd_1d�����?truedivadd_1/yLog_12r�����?add_1concat_12�����?Log_1action2�����?Log_1dense_1/BiasAdd�����?main_graph_0/hidden_0/Muldense_1/kernel� dense_1/bias
value_estimate2�����?dense_1/BiasAdd�?�?@@�?�?@@@@@���.�?���.�)= �=��X����>��>���d?)>PXx>.���Ӿ�Q��d?>�8�>Y�&=:[:>ۙa>��>��f>A
=95����>��>d�����D���=���>^��sN����,=�=ܴ�>B��Q�>�Kξԙ��۹[>ۺ�����>0��K�>�0�қ�=���;T���>���>�>ʚ`>f�>>bS�>ɴJ>��>�Ƣ�$�j���@���6�P���>L�+Y$��Ԫ�p��>��Ǽ!�ž�jf>k����G�����>��=�T���ث���iܾQy�>��k>[�>lN8�<'=0�R��#��1�
J>��پ���>x��;�NN>�I^>��8=Ԣ��7�~= ʋ>�B�;+��>:�&>�CR��v�����>�0c��h>w��>2Ѿ�F����>�1��@ȶ>y�\=]�"=���Tn�=�)�>K�>3��>ѩ��� ��u|M>�B��xj��>�c�>�k����>NO�>8�����j<�>����W���<�>�Zþח�>��>�>L�e�Y��>��=�J�=�x+��[!�-1�>3�q=�z���?(����mί>x�5����>�ߴ���ؼo���Z7��8O>�m����=F�5>������C�zcݾ@��JqL>DP�B�o������D?�h=r�? ����������쩾"���������?������q"���Ɇ�͂>>�:eH���i�ΰ�>;���˕>ZG >��R<o����->X������Z��> nA<�v����> �H������F=SR0>У�=�hC���Ⱦ�~s��}>pj潦/����4��5#���>��O=C8�>�Y��[�>8@,>!�>��=>�Y����>����"�>S�m>�z�>l�<��>H��^0+�-�V�~ؾ�j�=� ���>}�=��?r=i��;Z?c�>��%���I>M ��k>U��>�樺%����p�>�����s���S�Kp)�5�?!��)��>�!?^��>4��T�=�Q�>��L�"u�>�Ͼ-��>buo>0�?����q��<�_��z���h����cD? � ��C?���>3z�>�|~���5?���>ۗ��|�&��G�s��=��$?�4?o��5�'?�D�@h��,�������C&?������>���>��?� ���c
?�q?j�_=�膾y���>�H>���>� ˾��>��f����?0���\���'?�br���>p�7?8?�?��<�>�d�>}���H������9?Ϛ�>�1�>�짾�x?����]���b��1��VA�>�ޞ��c�>��7?�;)?Sϟ�j=c>gǸ>�)��z���Hʾ����>�M ?p� ?�hT��"Ͼ�S=��f>�%=� O���Q���=��>�!Ѽ [�>�x��{>O�j>q嶾̇>i�=�i¼��>E�a���4>��4>�ۅ��W<>U����_,>��>�Ѧ;
J���4>C��>i�>����.���|�=�݃��H:>H�����˼�ܽ��=fG2� �w� +�=y9�=�F>^��=�� �&>�����l>�޲�[�<[��>A�>vf���P��8�s�4�;1F1>m�<ƨ7?������C& ���ݾh�&?ˎ���4?33?��!?��J5.?�b6?s��5���J ���"?^�&?��-?
�� ����
��$?a��>H��>(� ����>�s�>Yb�� �>e��>�2 ��5�>�7�>eF��7ھ�Q
���?W ?T.�>g���$���m�;?��ᾚ0߾H�?G��_���љ?C�>-ɽ>�� ������Hm/?�������a=?�A�>*w�>x) ��^z>�a >%���3��>h?M����kv����?=���
��xV?�/쾷����?
�>�S?�����$>��r����>JI��F�5?r6e�� 0�3�->D�G�u�f�܂�9S��0�=�C�>�T����o�6�>y �>׹2>2����^=

272
UnitySDK/Assets/ML-Agents/Examples/Bouncer/TFModels/BouncerLearning.nn
文件差异内容过多而无法显示
查看文件

1001
UnitySDK/Assets/ML-Agents/Examples/Crawler/TFModels/CrawlerDynamicLearning.nn
文件差异内容过多而无法显示
查看文件

1001
UnitySDK/Assets/ML-Agents/Examples/Crawler/TFModels/CrawlerStaticLearning.nn
文件差异内容过多而无法显示
查看文件

997
UnitySDK/Assets/ML-Agents/Examples/GridWorld/TFModels/GridWorldLearning.nn
文件差异内容过多而无法显示
查看文件

999
UnitySDK/Assets/ML-Agents/Examples/Hallway/TFModels/HallwayLearning.nn
文件差异内容过多而无法显示
查看文件

1000
UnitySDK/Assets/ML-Agents/Examples/PushBlock/TFModels/PushBlockLearning.nn
文件差异内容过多而无法显示
查看文件

1000
UnitySDK/Assets/ML-Agents/Examples/Pyramids/TFModels/PyramidsLearning.nn
文件差异内容过多而无法显示
查看文件

1001
UnitySDK/Assets/ML-Agents/Examples/Reacher/TFModels/ReacherLearning.nn
文件差异内容过多而无法显示
查看文件

1001
UnitySDK/Assets/ML-Agents/Examples/Tennis/TFModels/TennisLearning.nn
文件差异内容过多而无法显示
查看文件

1001
UnitySDK/Assets/ML-Agents/Examples/WallJump/TFModels/BigWallJumpLearning.nn
文件差异内容过多而无法显示
查看文件

1001
UnitySDK/Assets/ML-Agents/Examples/WallJump/TFModels/SmallWallJumpLearning.nn
文件差异内容过多而无法显示
查看文件

70
UnitySDK/Assets/ML-Agents/Scripts/Academy.cs


/**
* Welcome to Unity Machine Learning Agents (ML-Agents).
*
*
* The ML-Agents toolkit contains five entities: Academy, Brain, Agent, Communicator and
* Python API. The academy, and all its brains and connected agents live within
* a learning environment (herin called Environment), while the communicator

[Tooltip("Frames per second (FPS) engine attempts to maintain.")]
public int targetFrameRate;
/// Initializes a new instance of the
/// Initializes a new instance of the
/// <see cref="EnvironmentConfiguration"/> class.
/// <param name="width">Width of environment window (pixels).</param>
/// <param name="height">Height of environment window (pixels).</param>

}
/// <summary>
/// An Academy is where Agent objects go to train their behaviors. More
/// An Academy is where Agent objects go to train their behaviors. More
/// in a scene is attached to one brain (a single brain may be attached to
/// in a scene is attached to one brain (a single brain may be attached to
/// multiple agents). Currently, this class is expected to be extended to
/// implement the desired academy behavior.
/// </summary>

"docs/Learning-Environment-Design-Academy.md")]
public abstract class Academy : MonoBehaviour
{
[SerializeField]
[SerializeField]
private const string kApiVersion = "API-7";
private const string kApiVersion = "API-8";
/// Used to restore oringal value when deriving Academy modifies it
/// Used to restore oringal value when deriving Academy modifies it
/// Used to restore oringal value when deriving Academy modifies it
/// Used to restore oringal value when deriving Academy modifies it
/// Used to restore oringal value when deriving Academy modifies it
/// Used to restore oringal value when deriving Academy modifies it
private float originalMaximumDeltaTime;
// Fields provided in the Inspector

/// <summary/>
/// <remarks>
/// Default reset parameters are specified in the academy Editor, and can
/// be modified when training with an external Brain by passinga config
/// dictionary at reset.
/// be modified when training with an external Brain by passinga config
/// dictionary at reset.
/// </remarks>
[SerializeField]
[Tooltip("List of custom parameters that can be changed in the " +

/// the same message is not used multiple times.
private ulong lastCommunicatorMessageNumber;
/// If true, the Academy will use inference settings. This field is
/// If true, the Academy will use inference settings. This field is
/// initialized in <see cref="Awake"/> depending on the presence
/// or absence of a communicator. Furthermore, it can be modified by an
/// external Brain during reset via <see cref="SetIsInference"/>.

/// current episode.
bool maxStepReached;
/// The number of episodes completed by the environment. Incremented
/// The number of episodes completed by the environment. Incremented
/// each time a step is taken in the environment. Is reset to 0 during
/// each time a step is taken in the environment. Is reset to 0 during
/// <see cref="AcademyReset"/>.
int stepCount;

// The Academy uses a series of events to communicate with agents and
// brains to facilitate synchronization. More specifically, it ensure
// that all the agents performs their steps in a consistent order (i.e. no
// agent can act based on a decision before another agent has had a chance
// agent can act based on a decision before another agent has had a chance
// Signals to all the Brains at each environment step so they can decide
// Signals to all the Brains at each environment step so they can decide
// actions for their agents.
public event System.Action BrainDecideAction;

// Signals to all the agents at each environment step along with the
// Signals to all the agents at each environment step along with the
// Academy's maxStepReached, done and stepCount values. The agents rely
// on this event to update their own values of max step reached and done
// in addition to aligning on the step count of the global episode.

// if their flag has been set to done (assuming the agent has requested a
// if their flag has been set to done (assuming the agent has requested a
// decision).
public event System.Action AgentResetIfDone;

originalGravity = Physics.gravity;
originalFixedDeltaTime = Time.fixedDeltaTime;
originalMaximumDeltaTime = Time.maximumDeltaTime;
InitializeAcademy();
Communicator communicator = null;

{
brain.SetToControlledExternally();
}
// Try to launch the communicator by usig the arguments passed at launch
try
{

Random.InitState(pythonParameters.Seed);
Application.logMessageReceived += HandleLog;
logPath = Path.GetFullPath(".") + "/UnitySDK.log";
logWriter = new StreamWriter(logPath, false);
logWriter.WriteLine(System.DateTime.Now.ToString());
logWriter.WriteLine(" ");
logWriter.Close();
using (var fs = File.Open(logPath, FileMode.Append, FileAccess.Write, FileShare.ReadWrite))
{
logWriter = new StreamWriter(fs);
logWriter.WriteLine(System.DateTime.Now.ToString());
logWriter.WriteLine(" ");
logWriter.Close();
}
}
// If a communicator is enabled/provided, then we assume we are in

void HandleLog(string logString, string stackTrace, LogType type)
{
logWriter = new StreamWriter(logPath, true);
logWriter.WriteLine(type.ToString());
logWriter.WriteLine(logString);
logWriter.WriteLine(stackTrace);
logWriter.Close();
using (var fs = File.Open(logPath, FileMode.Append, FileAccess.Write, FileShare.ReadWrite))
{
logWriter = new StreamWriter(fs);
logWriter.WriteLine(type.ToString());
logWriter.WriteLine(logString);
logWriter.WriteLine(stackTrace);
logWriter.Close();
}
}
/// <summary>

}
/// <summary>
/// Forces the full reset. The done flags are not affected. Is either
/// Forces the full reset. The done flags are not affected. Is either
/// called the first reset at inference and every external reset
/// at training.
/// </summary>

4
docs/Installation.md


If you intend to make modifications to `ml-agents` or `ml-agents-envs`, you should install
the packages from the cloned repo rather than from PyPi. To do this, you will need to install
`ml-agents` and `ml-agents-envs` separately. Do this by running (starting from the repo's main
directory):
`ml-agents` and `ml-agents-envs` separately. From the repo's root directory, run:
```sh
cd ml-agents-envs

reflected when you run `mlagents-learn`. It is important to install these packages in this order as the
`mlagents` package depends on `mlagents_envs`, and installing it in the other
order will download `mlagents_envs` from PyPi.
## Docker-based Installation

2
docs/Learning-Environment-Examples.md


the jump.
* Visual Observations: None.
* Reset Parameters: None.
* Benchmark Mean Reward: 2.5
* Benchmark Mean Reward: 10
## [Soccer Twos](https://youtu.be/Hg3nmYD3DjQ)

34
docs/Migrating.md


# Migrating
## Migrating from ML-Agents toolkit v0.7 to v0.8
### Important Changes
* We have split the Python packges into two seperate packages `ml-agents` and `ml-agents-envs`.
* `--worker-id` option of `learn.py` has been removed, use `--base-port` instead if you'd like to run multiple instances of `learn.py`.
#### Steps to Migrate
* If you are installing via PyPI, there is no change.
* If you intend to make modifications to `ml-agents` or `ml-agents-envs` please check the Installing for Development in the [Installation documentation](Installation.md).
## Migrating from ML-Agents toolkit v0.6 to v0.7
### Important Changes
* We no longer support TFS and are now using the [Unity Inference Engine](Unity-Inference-Engine.md)
#### Steps to Migrate
* Make sure to remove the `ENABLE_TENSORFLOW` flag in your Unity Project settings
* Brains are now Scriptable Objects instead of MonoBehaviors.
* Brains are now Scriptable Objects instead of MonoBehaviors.
__Note:__ You can pass the same Brain to multiple agents in a scene by
__Note:__ You can pass the same Brain to multiple agents in a scene by
leveraging Unity's prefab system or look for all the agents in a scene
using the search bar of the `Hierarchy` window with the word `Agent`.

* We removed the `Broadcast` checkbox of the Brain, to use the broadcast
* We removed the `Broadcast` checkbox of the Brain, to use the broadcast
* When training multiple Brains at the same time, each model is now stored
* When training multiple Brains at the same time, each model is now stored
graph scopes.
graph scopes.
* The **Learning Brain** graph scope, placeholder names, output names and custom
placeholders can no longer be modified.

* Agents have a `Brain` field in the Inspector, you need to drag the
appropriate Brain ScriptableObject in it.
* The Academy has a `Broadcast Hub` field in the inspector, which is
list of brains used in the scene. To train or control your Brain
from the `mlagents-learn` Python script, you need to drag the relevant
`LearningBrain` ScriptableObjects used in your scene into entries
list of brains used in the scene. To train or control your Brain
from the `mlagents-learn` Python script, you need to drag the relevant
`LearningBrain` ScriptableObjects used in your scene into entries
into this list.
## Migrating from ML-Agents toolkit v0.4 to v0.5

2
docs/Readme.md


* [Learning Environment Best Practices](Learning-Environment-Best-Practices.md)
* [Using the Monitor](Feature-Monitor.md)
* [Using an Executable Environment](Learning-Environment-Executable.md)
* [Creating Custom Protobuf Messages](Creating-Custom-Protobuf-Messages.md)
## Training

* [Training with LSTM](Feature-Memory.md)
* [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md)
* [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md)
* [Training Using Concurrent Unity Instances](Training-Using-Concurrent-Unity-Instances.md)
* [Using TensorBoard to Observe Training](Using-Tensorboard.md)
## Inference

8
docs/Training-ML-Agents.md


[Academy Properties](Learning-Environment-Design-Academy.md#academy-properties).
* `--train` – Specifies whether to train model or only run in inference mode.
When training, **always** use the `--train` option.
* `--worker-id=<n>` – When you are running more than one training environment at
the same time, assign each a unique worker-id number. The worker-id is added
to the communication port opened between the current instance of
`mlagents-learn` and the ExternalCommunicator object in the Unity environment.
Defaults to 0.
* `--num-envs=<n>` - Specifies the number of concurrent Unity environment instances to collect
experiences from when training. Defaults to 1.
* `--base-port` - Specifies the starting port. Each concurrent Unity environment instance will get assigned a port sequentially, starting from the `base-port`. Each instance will use the port `(base_port + worker_id)`, where the `worker_id` is sequential IDs given to each instance from 0 to `num_envs - 1`. Default is 5005.
* `--docker-target-name=<dt>` – The Docker Volume on which to store curriculum,
executable and model files. See [Using Docker](Using-Docker.md).
* `--no-graphics` - Specify this option to run the Unity executable in

4
gym-unity/setup.py


from setuptools import setup, find_packages
setup(name='gym_unity',
version='0.3.0',
version='0.4.0',
description='Unity Machine Learning Agents Gym Interface',
license='Apache License 2.0',
author='Unity Technologies',

install_requires=['gym', 'mlagents_envs']
install_requires=['gym', 'mlagents_envs==0.8.0']
)

13
ml-agents-envs/mlagents/envs/environment.py


seed: int = 0,
docker_training: bool = False,
no_graphics: bool = False,
timeout_wait: int = 30,
train_mode: bool = True):
timeout_wait: int = 30):
"""
Starts a new unity environment and establishes a connection with the environment.
Notice: Currently communication between Unity and Python takes place over an open socket without authentication.

atexit.register(self._close)
self.port = base_port + worker_id
self._buffer_size = 12000
self._version_ = "API-7"
self._version_ = "API-8"
self._train_mode = train_mode
# If the environment name is None, a new environment will not be launched
# and the communicator will directly try to connect to an existing unity environment.

for k in self._resetParameters])) + '\n' + \
'\n'.join([str(self._brains[b]) for b in self._brains])
def reset(self, config=None, train_mode=None, custom_reset_parameters=None) -> AllBrainInfo:
def reset(self, config=None, train_mode=True, custom_reset_parameters=None) -> AllBrainInfo:
"""
Sends a signal to reset the unity environment.
:return: AllBrainInfo : A data structure corresponding to the initial reset state of the environment.

else:
raise UnityEnvironmentException(
"The parameter '{0}' is not a valid parameter.".format(k))
if train_mode is None:
train_mode = self._train_mode
else:
self._train_mode = train_mode
if self._loaded:
outputs = self.communicator.exchange(

2
ml-agents-envs/mlagents/envs/mock_communicator.py


)
rl_init = UnityRLInitializationOutput(
name="RealFakeAcademy",
version="API-7",
version="API-8",
log_path="",
brain_parameters=[bp]
)

29
ml-agents-envs/mlagents/envs/subprocess_environment.py


from typing import *
import copy
import numpy as np
import cloudpickle
from mlagents.envs import UnityEnvironment
from multiprocessing import Process, Pipe

conn: Connection
def send(self, name: str, payload=None):
cmd = EnvironmentCommand(name, payload)
self.conn.send(cmd)
try:
cmd = EnvironmentCommand(name, payload)
self.conn.send(cmd)
except (BrokenPipeError, EOFError):
raise KeyboardInterrupt
response: EnvironmentResponse = self.conn.recv()
return response
try:
response: EnvironmentResponse = self.conn.recv()
return response
except (BrokenPipeError, EOFError):
raise KeyboardInterrupt
try:
self.conn.send(EnvironmentCommand('close'))
except (BrokenPipeError, EOFError):
pass
def worker(parent_conn: Connection, env_factory: Callable[[int], UnityEnvironment], worker_id: int):
def worker(parent_conn: Connection, pickled_env_factory: str, worker_id: int):
env_factory: Callable[[int], UnityEnvironment] = cloudpickle.loads(pickled_env_factory)
env = env_factory(worker_id)
def _send_response(cmd_name, payload):

elif cmd.name == 'global_done':
_send_response('global_done', env.global_done)
elif cmd.name == 'close':
env.close()
break
except KeyboardInterrupt:
print('UnityEnvironment worker: keyboard interrupt')

env_factory: Callable[[int], BaseUnityEnvironment]
) -> UnityEnvWorker:
parent_conn, child_conn = Pipe()
child_process = Process(target=worker, args=(child_conn, env_factory, worker_id))
# Need to use cloudpickle for the env factory function since function objects aren't picklable
# on Windows as of Python 3.6.
pickled_env_factory = cloudpickle.dumps(env_factory)
child_process = Process(target=worker, args=(child_conn, pickled_env_factory, worker_id))
child_process.start()
return UnityEnvWorker(child_process, worker_id, parent_conn)

6
ml-agents-envs/setup.py


setup(
name='mlagents_envs',
version='0.7.0',
version='0.8.0',
description='Unity Machine Learning Agents Interface',
url='https://github.com/Unity-Technologies/ml-agents',
author='Unity Technologies',

'numpy>=1.13.3,<=1.16.1',
'pytest>=3.2.2,<4.0.0',
'protobuf>=3.6,<3.7',
'grpcio>=1.11.0,<1.12.0'],
'grpcio>=1.11.0,<1.12.0',
'cloudpickle==0.8.1'],
python_requires=">=3.5,<3.8",
)

26
ml-agents/mlagents/trainers/learn.py


trainer_config_path = run_options['<trainer-config-path>']
# Recognize and use docker volume if one is passed as an argument
if not docker_target_name:
model_path = './models/{run_id}'.format(run_id=run_id)
model_path = './models/{run_id}-{sub_id}'.format(run_id=run_id, sub_id=sub_id)
summaries_dir = './summaries'
else:
trainer_config_path = \

'/{docker_target_name}/{curriculum_folder}'.format(
docker_target_name=docker_target_name,
curriculum_folder=curriculum_folder)
model_path = '/{docker_target_name}/models/{run_id}'.format(
model_path = '/{docker_target_name}/models/{run_id}-{sub_id}'.format(
run_id=run_id)
run_id=run_id,
sub_id=sub_id)
summaries_dir = '/{docker_target_name}/summaries'.format(
docker_target_name=docker_target_name)

docker_target_name,
no_graphics,
run_seed,
base_port + (sub_id * num_envs),
fast_simulation
base_port + (sub_id * num_envs)
)
env = SubprocessUnityEnvironment(env_factory, num_envs)
maybe_meta_curriculum = try_create_meta_curriculum(curriculum_folder, env)

save_freq, maybe_meta_curriculum,
load_model, train_model,
keep_checkpoints, lesson, env.external_brains,
run_seed)
run_seed, fast_simulation)
# Signal that environment has been launched.
process_queue.put(True)

docker_target_name: str,
no_graphics: bool,
seed: Optional[int],
start_port: int,
fast_simulation: bool
start_port: int
) -> Callable[[int], BaseUnityEnvironment]:
if env_path is not None:
# Strip out executable extensions if passed

seed=env_seed,
docker_training=docker_training,
no_graphics=no_graphics,
base_port=start_port,
train_mode=(not fast_simulation)
base_port=start_port
)
return create_unity_environment

# Wait for signal that environment has successfully launched
while process_queue.get() is not True:
continue
# Wait for jobs to complete. Otherwise we'll have an extra
# unhandled KeyboardInterrupt if we end early.
try:
for job in jobs:
job.join()
except KeyboardInterrupt:
pass
# For python debugger to directly run this script
if __name__ == "__main__":

7
ml-agents/mlagents/trainers/tests/test_learn.py


with patch.object(TrainerController, "start_learning", MagicMock()):
learn.run_training(0, 0, basic_options(), MagicMock())
mock_init.assert_called_once_with(
'./models/ppo',
'./models/ppo-0',
'./summaries',
'ppo-0',
50000,

5,
0,
subproc_env_mock.return_value.external_brains,
0
0,
True
)

with patch.object(TrainerController, "start_learning", MagicMock()):
learn.run_training(0, 0, options_with_docker_target, MagicMock())
mock_init.assert_called_once()
assert(mock_init.call_args[0][0] == '/dockertarget/models/ppo')
assert(mock_init.call_args[0][0] == '/dockertarget/models/ppo-0')
assert(mock_init.call_args[0][1] == '/dockertarget/summaries')

5
ml-agents/mlagents/trainers/tests/test_trainer_controller.py


keep_checkpoints=False,
lesson=None,
external_brains={'testbrain': brain_info},
training_seed=99
training_seed=99,
fast_simulation=True
)
@patch('numpy.random.seed')

TrainerController('', '', '1', 1, None, True, False, False, None, {}, seed)
TrainerController('', '', '1', 1, None, True, False, False, None, {}, seed, True)
numpy_random_seed.assert_called_with(seed)
tensorflow_set_seed.assert_called_with(seed)

28
ml-agents/mlagents/trainers/trainer_controller.py


import logging
import shutil
import sys
if sys.platform.startswith('win'):
import win32api
import win32con
from typing import *
import numpy as np

from mlagents.envs import AllBrainInfo, BrainParameters
from mlagents.envs.base_unity_environment import BaseUnityEnvironment
from mlagents.envs.exception import UnityEnvironmentException
from mlagents.trainers import Trainer, Policy
from mlagents.trainers import Trainer
from mlagents.trainers.ppo.trainer import PPOTrainer
from mlagents.trainers.bc.offline_trainer import OfflineBCTrainer
from mlagents.trainers.bc.online_trainer import OnlineBCTrainer

keep_checkpoints: int,
lesson: Optional[int],
external_brains: Dict[str, BrainParameters],
training_seed: int):
training_seed: int,
fast_simulation: bool):
"""
:param model_path: Path to save the model.
:param summaries_dir: Folder to save training summaries.

self.meta_curriculum = meta_curriculum
self.seed = training_seed
self.training_start_time = time()
self.fast_simulation = fast_simulation
np.random.seed(self.seed)
tf.set_random_seed(self.seed)

'while the graph is generated.')
self._save_model(steps)
def _win_handler(self, event):
"""
This function gets triggered after ctrl-c or ctrl-break is pressed
under Windows platform.
"""
if event in (win32con.CTRL_C_EVENT, win32con.CTRL_BREAK_EVENT):
self._save_model_when_interrupted(self.global_step)
self._export_graph()
sys.exit()
return True
return False
def _write_training_metrics(self):
"""
Write all CSV metrics

environment.
"""
if self.meta_curriculum is not None:
return env.reset(config=self.meta_curriculum.get_config())
return env.reset(train_mode=self.fast_simulation, config=self.meta_curriculum.get_config())
return env.reset()
return env.reset(train_mode=self.fast_simulation)
def start_learning(self, env: BaseUnityEnvironment, trainer_config):
# TODO: Should be able to start learning at different lesson numbers

for brain_name, trainer in self.trainers.items():
trainer.write_tensorboard_text('Hyperparameters',
trainer.parameters)
if sys.platform.startswith('win'):
# Add the _win_handler function to the windows console's handler function list
win32api.SetConsoleCtrlHandler(self._win_handler, True)
try:
curr_info = self._reset_env(env)
while any([t.get_step <= t.get_max_steps \

4
ml-agents/setup.py


setup(
name='mlagents',
version='0.7.0',
version='0.8.0',
description='Unity Machine Learning Agents',
long_description=long_description,
long_description_content_type='text/markdown',

zip_safe=False,
install_requires=[
'mlagents_envs==0.7.0',
'mlagents_envs==0.8.0',
'tensorflow>=1.7,<1.8',
'Pillow>=4.2.1',
'matplotlib',

168
docs/Creating-Custom-Protobuf-Messages.md


# Creating Custom Protobuf Messages
Unity and Python communicate by sending protobuf messages to and from each other. You can create custom protobuf messages if you want to exchange structured data beyond what is included by default.
## Implementing a Custom Message
Assume the ml-agents repository is checked out to a folder named $MLAGENTS_ROOT. Whenever you change the fields of a custom message, you must run `$MLAGENTS_ROOT/protobuf-definitions/make.bat` to create C# and Python files corresponding to the new message. Follow the directions in [this file](../protobuf-definitions/README.md) for guidance. After running `$MLAGENTS_ROOT/protobuf-definitions/make.bat`, reinstall the Python package by running `pip install $MLAGENTS_ROOT/ml-agents` and make sure your Unity project is using the newly-generated version of `$MLAGENTS_ROOT/UnitySDK`.
## Custom Message Types
There are three custom message types currently supported - Custom Actions, Custom Reset Parameters, and Custom Observations. In each case, `env` is an instance of a `UnityEnvironment` in Python.
### Custom Actions
By default, the Python API sends actions to Unity in the form of a floating point list and an optional string-valued text action for each agent.
You can define a custom action type, to either replace or augment the default, by adding fields to the `CustomAction` message, which you can do by editing the file `protobuf-definitions/proto/mlagents/envs/communicator_objects/custom_action.proto`.
Instances of custom actions are set via the `custom_action` parameter of the `env.step`. An agent receives a custom action by defining a method with the signature:
```csharp
public virtual void AgentAction(float[] vectorAction, string textAction, CommunicatorObjects.CustomAction customAction)
```
Below is an example of creating a custom action that instructs an agent to choose a cardinal direction to walk in and how far to walk.
The `custom_action.proto` file looks like:
```protobuf
syntax = "proto3";
option csharp_namespace = "MLAgents.CommunicatorObjects";
package communicator_objects;
message CustomAction {
enum Direction {
NORTH=0;
SOUTH=1;
EAST=2;
WEST=3;
}
float walkAmount = 1;
Direction direction = 2;
}
```
The Python instance of the custom action looks like:
```python
from mlagents.envs.communicator_objects import CustomAction
env = mlagents.envs.UnityEnvironment(...)
...
action = CustomAction(direction=CustomAction.NORTH, walkAmount=2.0)
env.step(custom_action=action)
```
And the agent code looks like:
```csharp
...
using MLAgents;
using MLAgents.CommunicatorObjects;
class MyAgent : Agent {
...
override public void AgentAction(float[] vectorAction, string textAction, CustomAction customAction) {
switch(customAction.Direction) {
case CustomAction.Types.Direction.North:
transform.Translate(0, 0, customAction.WalkAmount);
break;
...
}
}
}
```
Keep in mind that the protobuffer compiler automatically configures the capitalization scheme of the C# version of the custom field names you defined in the `CustomAction` message to match C# conventions - "NORTH" becomes "North", "walkAmount" becomes "WalkAmount", etc.
### Custom Reset Parameters
By default, you can configure an environment `env` in the Python API by specifying a `config` parameter that is a dictionary mapping strings to floats.
You can also configure the environment reset using a custom protobuf message. To do this, add fields to the `CustomResetParameters` protobuf message in `custom_reset_parameters.proto`, analogously to `CustomAction` above. Then pass an instance of the message to `env.reset` via the `custom_reset_parameters` keyword parameter.
In Unity, you can then access the `customResetParameters` field of your academy to accesss the values set in your Python script.
In this example, the academy is setting the initial position of a box based on custom reset parameters. The `custom_reset_parameters.proto` would look like:
```protobuf
message CustomResetParameters {
message Position {
float x = 1;
float y = 2;
float z = 3;
}
message Color {
float r = 1;
float g = 2;
float b = 3;
}
Position initialPos = 1;
Color color = 2;
}
```
The Python instance of the custom reset parameter looks like
```python
from mlagents.envs.communicator_objects import CustomResetParameters
env = ...
pos = CustomResetParameters.Position(x=1, y=1, z=2)
color = CustomResetParameters.Color(r=.5, g=.1, b=1.0)
params = CustomResetParameters(initialPos=pos, color=color)
env.reset(custom_reset_parameters=params)
```
The academy looks like
```csharp
public class MyAcademy : Academy
{
public GameObject box; // This would be connected to a game object in your scene in the Unity editor.
override public void AcademyReset()
{
var boxParams = customResetParameters;
if (boxParams != null)
{
var pos = boxParams.InitialPos;
var color = boxParams.Color;
box.transform.position = new Vector3(pos.X, pos.Y, pos.Z);
box.GetComponent<Renderer>().material.color = new Color(color.R, color.G, color.B);
}
}
}
```
### Custom Observations
By default, Unity returns observations to Python in the form of a floating-point vector.
You can define a custom observation message to supplement that. To do so, add fields to the `CustomObservation` protobuf message in `custom_observation.proto`.
Then in your agent, create an instance of a custom observation via `new CommunicatorObjects.CustomObservation`. Then in `CollectObservations`, call `SetCustomObservation` with the custom observation instance as the parameter.
In Python, the custom observation can be accessed by calling `env.step` or `env.reset` and accessing the `custom_observations` property of the return value. It will contain a list with one `CustomObservation` instance per agent.
For example, if you have added a field called `customField` to the `CustomObservation` message, the agent code looks like:
```csharp
class MyAgent : Agent {
override public void CollectObservations() {
var obs = new CustomObservation();
obs.CustomField = 1.0;
SetCustomObservation(obs);
}
}
```
In Python, the custom field would be accessed like:
```python
...
result = env.step(...)
result[brain_name].custom_observations[0].customField
```
where `brain_name` is the name of the brain attached to the agent.

25
docs/Training-Using-Concurrent-Unity-Instances.md


# Training Using Concurrent Unity Instances
As part of release v0.8, we enabled developers to run concurrent, parallel instances of the Unity executable during training. For certain scenarios, this should speed up the training.
## How to Run Concurrent Unity Instances During Training
Please refer to the general instructions on [Training ML-Agents](Training-ML-Agents.md). In order to run concurrent Unity instances during training, set the number of environment instances using the command line option `--num-envs=<n>` when you invoke `mlagents-learn`. Optionally, you can also set the `--base-port`, which is the starting port used for the concurrent Unity instances.
## Considerations
### Buffer Size
If you are having trouble getting an agent to train, even with multiple concurrent Unity instances, you could increase `buffer_size` in the `config/trainer_config.yaml` file. A common practice is to multiply `buffer_size` by `num-envs`.
### Resource Constraints
Invoking concurrent Unity instances is constrained by the resources on the machine. Please use discretion when setting `--num-envs=<n>`.
### Using num-runs and num-envs
If you set `--num-runs=<n>` greater than 1 and are also invoking concurrent Unity instances using `--num-envs=<n>`, then the number of concurrent Unity instances is equal to `num-runs` times `num-envs`.
### Result Variation Using Concurrent Unity Instances
If you keep all the hyperparameters the same, but change `--num-envs=<n>`, the results and model would likely change.

4
docs/Using-TensorFlow-Sharp-in-Unity.md


# Using TensorFlowSharp in Unity
As of version 0.7.0, we have included our own Inference Engine as a replacement for TFS. Please refer to the [release notes](https://github.com/Unity-Technologies/ml-agents/releases/tag/0.7.0) and [Unity Inference Engine documentation](Unity-Inference-Engine.md)

167
docs/Custom-Protos.md


# Creating custom protobuf messages
Unity and Python communicate by sending protobuf messages to and from each other. You can create custom protobuf messages if you want to exchange structured data beyond what is included by default.
Assume the ml-agents repository is checked out to a folder named $MLAGENTS_ROOT. Whenever you change the fields of a custom message, you must run `$MLAGENTS_ROOT/protobuf-definitions/make.bat` to create C# and Python files corresponding to the new message. Follow the directions in [this file](../protobuf-definitions/README.md) for guidance. After running it, reinstall the Python package by running `pip install $MLAGENTS_ROOT/ml-agents` and make sure your Unity project is using the newly-generated version of `$MLAGENTS_ROOT/UnitySDK`.
## Custom message types
There are three custom message types currently supported, described below. In each case, `env` is an instance of a `UnityEnvironment` in Python. `CustomAction` is described most thoroughly; usage of the other custom messages follows a similar template.
### Custom actions
By default, the Python API sends actions to Unity in the form of a floating-point list per agent and an optional string-valued text action.
You can define a custom action type to replace or augment this by adding fields to the `CustomAction` message, which you can do by editing the file `protobuf-definitions/proto/mlagents/envs/communicator_objects/custom_action.proto`.
Instances of custom actions are set via the `custom_action` parameter of `env.step`. An agent receives a custom action by defining a method with the signature
```csharp
public virtual void AgentAction(float[] vectorAction, string textAction, CommunicatorObjects.CustomAction customAction)
```
Here is an example of creating a custom action that instructs an agent to choose a cardinal direction to walk in and how far to walk.
`custom_action.proto` will look like
```protobuf
syntax = "proto3";
option csharp_namespace = "MLAgents.CommunicatorObjects";
package communicator_objects;
message CustomAction {
enum Direction {
NORTH=0;
SOUTH=1;
EAST=2;
WEST=3;
}
float walkAmount = 1;
Direction direction = 2;
}
```
In your Python file, create an instance of a custom action:
```python
from mlagents.envs.communicator_objects import CustomAction
env = mlagents.envs.UnityEnvironment(...)
...
action = CustomAction(direction=CustomAction.NORTH, walkAmount=2.0)
env.step(custom_action=action)
```
Then in your agent,
```csharp
...
using MLAgents;
using MLAgents.CommunicatorObjects;
class MyAgent : Agent {
...
override public void AgentAction(float[] vectorAction, string textAction, CustomAction customAction) {
switch(customAction.Direction) {
case CustomAction.Types.Direction.North:
transform.Translate(0, 0, customAction.WalkAmount);
break;
...
}
}
}
```
Note that the protobuffer compiler automatically configures the capitalization scheme of the C# version of the custom field names you defined in the `CustomAction` message to match C# conventions - "NORTH" becomes "North", "walkAmount" becomes "WalkAmount", etc.
### Custom reset parameters
By default, you can configure an environment `env ` in the Python API by specifying a `config` parameter that is a dictionary mapping strings to floats.
You can also configure an environment using a custom protobuf message. To do so, add fields to the `CustomResetParameters` protobuf message in `custom_reset_parameters.proto`, analogously to `CustomAction` above. Then pass an instance of the message to `env.reset` via the `custom_reset_parameters` keyword parameter.
In Unity, you can then access the `customResetParameters` field of your academy to accesss the values set in your Python script.
In this example, an academy is setting the initial position of a box based on custom reset parameters that looks like
```protobuf
message CustomResetParameters {
message Position {
float x = 1;
float y = 2;
float z = 3;
}
message Color {
float r = 1;
float g = 2;
float b = 3;
}
Position initialPos = 1;
Color color = 2;
}
```
In your academy, you'd have something like
```csharp
public class MyAcademy : Academy
{
public GameObject box; // This would be connected to a game object in your scene in the Unity editor.
override public void AcademyReset()
{
var boxParams = customResetParameters;
if (boxParams != null)
{
var pos = boxParams.InitialPos;
var color = boxParams.Color;
box.transform.position = new Vector3(pos.X, pos.Y, pos.Z);
box.GetComponent<Renderer>().material.color = new Color(color.R, color.G, color.B);
}
}
}
```
Then in Python, when setting up your scene, you might write
```python
from mlagents.envs.communicator_objects import CustomResetParameters
env = ...
pos = CustomResetParameters.Position(x=1, y=1, z=2)
color = CustomResetParameters.Color(r=.5, g=.1, b=1.0)
params = CustomResetParameters(initialPos=pos, color=color)
env.reset(custom_reset_parameters=params)
```
### Custom observations
By default, Unity returns observations to Python in the form of a floating-point vector.
You can define a custom observation message to supplement that. To do so, add fields to the `CustomObservation` protobuf message in `custom_observation.proto`.
Then in your agent, create an instance of a custom observation via `new CommunicatorObjects.CustomObservation`. Then in `CollectObservations`, call `SetCustomObservation` with the custom observation instance as the parameter.
In Python, the custom observation can be accessed by calling `env.step` or `env.reset` and accessing the `custom_observations` property of the return value. It will contain a list with one `CustomObservation` instance per agent.
For example, if you have added a field called `customField` to the `CustomObservation` message, you would program your agent like
```csharp
class MyAgent : Agent {
override public void CollectObservations() {
var obs = new CustomObservation();
obs.CustomField = 1.0;
SetCustomObservation(obs);
}
}
```
Then in Python, the custom field would be accessed like
```python
...
result = env.step(...)
result[brain_name].custom_observations[0].customField
```
where `brain_name` is the name of the brain attached to the agent.

/ml-agents/tests/trainers/test_trainer_metrics.py → /ml-agents/mlagents/trainers/tests/test_trainer_metrics.py

/ml-agents/tests/envs/test_subprocess_unity_environment.py → /ml-agents-envs/mlagents/envs/tests/test_subprocess_unity_environment.py

正在加载...
取消
保存