浏览代码

Merge branch 'master' into hh/develop/ragdoll-updates

/hh-develop-ragdoll-testing
HH 5 年前
当前提交
7afa1761
共有 78 个文件被更改,包括 2176 次插入2315 次删除
  1. 4
      .pre-commit-search-and-replace.yaml
  2. 2
      .pylintrc
  3. 12
      Project/Assets/ML-Agents/Examples/SharedAssets/Scripts/DirectionIndicator.cs
  4. 3
      Project/Assets/ML-Agents/Examples/SharedAssets/Scripts/JointDriveController.cs
  5. 2
      Project/Assets/ML-Agents/Examples/Walker/Demos/ExpertWalkerDyna.demo.meta
  6. 2
      Project/Assets/ML-Agents/Examples/Walker/Demos/ExpertWalkerStat.demo.meta
  7. 87
      Project/Assets/ML-Agents/Examples/Walker/Prefabs/DynamicPlatformWalker.prefab
  8. 82
      Project/Assets/ML-Agents/Examples/Walker/Prefabs/WalkerWithTargetPair.prefab
  9. 186
      Project/Assets/ML-Agents/Examples/Walker/Scenes/WalkerStatic.unity
  10. 122
      Project/Assets/ML-Agents/Examples/Walker/Scripts/WalkerAgent.cs
  11. 1001
      Project/Assets/ML-Agents/Examples/Walker/TFModels/WalkerDynamic.nn
  12. 2
      Project/Assets/ML-Agents/Examples/Walker/TFModels/WalkerDynamic.nn.meta
  13. 1001
      Project/Assets/ML-Agents/Examples/Walker/TFModels/WalkerStatic.nn
  14. 2
      Project/Assets/ML-Agents/Examples/Walker/TFModels/WalkerStatic.nn.meta
  15. 4
      README.md
  16. 34
      com.unity.ml-agents/CHANGELOG.md
  17. 10
      com.unity.ml-agents/Runtime/Academy.cs
  18. 8
      com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs
  19. 12
      com.unity.ml-agents/Runtime/Policies/BehaviorParameters.cs
  20. 4
      com.unity.ml-agents/Runtime/Sensors/Reflection/EnumReflectionSensor.cs
  21. 2
      com.unity.ml-agents/package.json
  22. 2
      config/ppo/SoccerTwos.yaml
  23. 4
      config/ppo/StrikersVsGoalie.yaml
  24. 6
      config/ppo/Tennis.yaml
  25. 8
      config/sac/WallJump.yaml
  26. 12
      config/sac/WalkerDynamic.yaml
  27. 2
      docs/Background-Machine-Learning.md
  28. 14
      docs/FAQ.md
  29. 6
      docs/Installation.md
  30. 38
      docs/Learning-Environment-Create-New.md
  31. 274
      docs/Learning-Environment-Design-Agents.md
  32. 10
      docs/Learning-Environment-Design.md
  33. 16
      docs/Migrating.md
  34. 3
      docs/Training-Configuration-File.md
  35. 15
      docs/Training-ML-Agents.md
  36. 2
      docs/Training-on-Amazon-Web-Service.md
  37. 3
      docs/Using-Tensorboard.md
  38. 2
      gym-unity/gym_unity/__init__.py
  39. 2
      markdown-link-check.full.json
  40. 2
      ml-agents-envs/mlagents_envs/__init__.py
  41. 32
      ml-agents-envs/mlagents_envs/base_env.py
  42. 12
      ml-agents-envs/mlagents_envs/registry/binary_utils.py
  43. 5
      ml-agents-envs/mlagents_envs/rpc_communicator.py
  44. 34
      ml-agents-envs/mlagents_envs/tests/test_steps.py
  45. 2
      ml-agents/README.md
  46. 42
      ml-agents/mlagents/model_serialization.py
  47. 2
      ml-agents/mlagents/trainers/__init__.py
  48. 14
      ml-agents/mlagents/trainers/cli_utils.py
  49. 4
      ml-agents/mlagents/trainers/ghost/trainer.py
  50. 23
      ml-agents/mlagents/trainers/learn.py
  51. 25
      ml-agents/mlagents/trainers/meta_curriculum.py
  52. 62
      ml-agents/mlagents/trainers/policy/tf_policy.py
  53. 21
      ml-agents/mlagents/trainers/ppo/trainer.py
  54. 23
      ml-agents/mlagents/trainers/sac/trainer.py
  55. 5
      ml-agents/mlagents/trainers/settings.py
  56. 15
      ml-agents/mlagents/trainers/tests/test_learn.py
  57. 23
      ml-agents/mlagents/trainers/tests/test_meta_curriculum.py
  58. 21
      ml-agents/mlagents/trainers/tests/test_nn_policy.py
  59. 8
      ml-agents/mlagents/trainers/tests/test_policy.py
  60. 1
      ml-agents/mlagents/trainers/tests/test_ppo.py
  61. 49
      ml-agents/mlagents/trainers/tests/test_rl_trainer.py
  62. 17
      ml-agents/mlagents/trainers/tests/test_sac.py
  63. 20
      ml-agents/mlagents/trainers/tests/test_simple_rl.py
  64. 2
      ml-agents/mlagents/trainers/tests/test_trainer_controller.py
  65. 38
      ml-agents/mlagents/trainers/trainer/rl_trainer.py
  66. 4
      ml-agents/mlagents/trainers/trainer/trainer.py
  67. 24
      ml-agents/mlagents/trainers/trainer_controller.py
  68. 4
      utils/validate_meta_files.py
  69. 297
      Project/Assets/ML-Agents/Examples/SharedAssets/Prefabs/OrientationCube.prefab
  70. 7
      Project/Assets/ML-Agents/Examples/SharedAssets/Prefabs/OrientationCube.prefab.meta
  71. 29
      config/sac/WalkerStatic.yaml
  72. 191
      ml-agents/mlagents/trainers/tests/test_config_conversion.py
  73. 60
      ml-agents/mlagents/trainers/tests/test_training_status.py
  74. 115
      ml-agents/mlagents/trainers/training_status.py
  75. 137
      ml-agents/mlagents/trainers/upgrade_config.py
  76. 10
      Project/Assets/ExpertWalkerDyn.demo.meta
  77. 110
      config/upgrade_config.py
  78. 0
      /config/sac/WalkerDynamic.yaml

4
.pre-commit-search-and-replace.yaml


search: /ML[ -]Agents toolkit/
replacement: ML-Agents Toolkit
insensitive: true
- description: Replace "the the"
search: /the the/
replacement: the
insensitive: true

2
.pylintrc


[MASTER]
# Add files or directories to the blacklist. They should be base names, not
# Add files or directories to the ignore list. They should be base names, not
# paths.
ignore=CVS

12
Project/Assets/ML-Agents/Examples/SharedAssets/Scripts/DirectionIndicator.cs


public Transform transformToFollow; //ex: hips or body
public Transform targetToLookAt; //target in the scene the indicator will point to
public float heightOffset;
private Vector3 m_StartingPos;
private float m_StartingYPos;
m_StartingPos = transform.position;
m_StartingYPos = transform.position.y;
transform.position = new Vector3(transformToFollow.position.x, m_StartingPos.y + heightOffset, transformToFollow.position.z);
Vector3 m_WalkDir = targetToLookAt.position - transform.position;
m_WalkDir.y = 0; //flatten dir on the y
transform.rotation = Quaternion.LookRotation(m_WalkDir);
transform.position = new Vector3(transformToFollow.position.x, m_StartingYPos + heightOffset, transformToFollow.position.z);
Vector3 walkDir = targetToLookAt.position - transform.position;
walkDir.y = 0; //flatten dir on the y
transform.rotation = Quaternion.LookRotation(walkDir);
}
}
}

3
Project/Assets/ML-Agents/Examples/SharedAssets/Scripts/JointDriveController.cs


[HideInInspector] public Dictionary<Transform, BodyPart> bodyPartsDict = new Dictionary<Transform, BodyPart>();
[HideInInspector] public List<BodyPart> bodyPartsList = new List<BodyPart>();
const float k_MaxAngularVelocity = 50.0f;
/// <summary>
/// Create BodyPart object and add it to dictionary.

startingPos = t.position,
startingRot = t.rotation
};
bp.rb.maxAngularVelocity = 50;
bp.rb.maxAngularVelocity = k_MaxAngularVelocity;
// Add & setup the ground contact script
bp.groundContact = t.GetComponent<GroundContact>();

2
Project/Assets/ML-Agents/Examples/Walker/Demos/ExpertWalkerDyna.demo.meta


guid: 1ea82869060c54bb48fed5b95baaf53c
ScriptedImporter:
fileIDToRecycleName:
11400000: Assets/Demonstrations/ExpertWalkerDyna.demo
11400002: Assets/ML-Agents/Examples/Walker/Demos/ExpertWalkerDyna.demo
externalObjects: {}
userData: ' (Unity.MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:

2
Project/Assets/ML-Agents/Examples/Walker/Demos/ExpertWalkerStat.demo.meta


guid: 720007cd6923e410abaa4ba800400cb0
ScriptedImporter:
fileIDToRecycleName:
11400000: Assets/Demonstrations/ExpertWalkerStat.demo
11400002: Assets/ML-Agents/Examples/Walker/Demos/ExpertWalkerStat.demo
externalObjects: {}
userData: ' (Unity.MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:

87
Project/Assets/ML-Agents/Examples/Walker/Prefabs/DynamicPlatformWalker.prefab


m_SortingLayerID: 0
m_SortingLayer: 0
m_SortingOrder: 0
--- !u!1001 &758428436173755182
PrefabInstance:
m_ObjectHideFlags: 0
serializedVersion: 2
m_Modification:
m_TransformParent: {fileID: 6065910098925129092}
m_Modifications:
- target: {fileID: 2591864625898824423, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_IsActive
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalPosition.x
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalPosition.y
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalPosition.z
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalRotation.x
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalRotation.y
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalRotation.z
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalRotation.w
value: 1
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_RootOrder
value: 2
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalEulerAnglesHint.x
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalEulerAnglesHint.y
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalEulerAnglesHint.z
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999519, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_Name
value: OrientationCube
objectReference: {fileID: 0}
m_RemovedComponents: []
m_SourcePrefab: {fileID: 100100000, guid: 41960beaa3d8041e19d82b5160042e55, type: 3}
--- !u!1 &2989930013812587953 stripped
GameObject:
m_CorrespondingSourceObject: {fileID: 2591864627249999519, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
m_PrefabInstance: {fileID: 758428436173755182}
m_PrefabAsset: {fileID: 0}
--- !u!1001 &6359877978260855390
PrefabInstance:
m_ObjectHideFlags: 0

type: 3}
propertyPath: m_Model
value:
objectReference: {fileID: 11400000, guid: a4ab86f6c972e43d6bbb52cd0dfffe07,
objectReference: {fileID: 11400000, guid: 2cb15010f7cbe4dc59418a5858c87819,
type: 3}
- target: {fileID: 895268871377934297, guid: 765582efd9dda46ed98564603316353f,
type: 3}

propertyPath: ground
value:
objectReference: {fileID: 4715966284166353839}
- target: {fileID: 7408209125961349353, guid: 765582efd9dda46ed98564603316353f,
type: 3}
propertyPath: orientationCube
value:
objectReference: {fileID: 2989930013812587953}
m_RemovedComponents: []
m_SourcePrefab: {fileID: 100100000, guid: 765582efd9dda46ed98564603316353f, type: 3}
--- !u!4 &6065910098925129092 stripped

82
Project/Assets/ML-Agents/Examples/Walker/Prefabs/WalkerWithTargetPair.prefab


type: 3}
propertyPath: m_Model
value:
objectReference: {fileID: 11400000, guid: 9091f3caa96b043b1afcc573049ca578,
objectReference: {fileID: 11400000, guid: 1cb7f6cc571fb4376b972bc090627b6d,
type: 3}
- target: {fileID: 895268871377934298, guid: 765582efd9dda46ed98564603316353f,
type: 3}

propertyPath: respawnTargetWhenTouched
value: 0
objectReference: {fileID: 0}
- target: {fileID: 7408209125961349353, guid: 765582efd9dda46ed98564603316353f,
type: 3}
propertyPath: orientationCube
value:
objectReference: {fileID: 5269189931577362882}
- target: {fileID: 7933235353030744139, guid: 765582efd9dda46ed98564603316353f,
type: 3}
propertyPath: m_ConnectedAnchor.x

type: 3}
m_PrefabInstance: {fileID: 2906899243981837092}
m_PrefabAsset: {fileID: 0}
--- !u!1001 &7703349395854010205
PrefabInstance:
m_ObjectHideFlags: 0
serializedVersion: 2
m_Modification:
m_TransformParent: {fileID: 2610895078227559678}
m_Modifications:
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalPosition.x
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalPosition.y
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalPosition.z
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalRotation.x
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalRotation.y
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalRotation.z
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalRotation.w
value: 1
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_RootOrder
value: 2
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalEulerAnglesHint.x
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalEulerAnglesHint.y
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999504, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_LocalEulerAnglesHint.z
value: 0
objectReference: {fileID: 0}
- target: {fileID: 2591864627249999519, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
propertyPath: m_Name
value: OrientationCube
objectReference: {fileID: 0}
m_RemovedComponents: []
m_SourcePrefab: {fileID: 100100000, guid: 41960beaa3d8041e19d82b5160042e55, type: 3}
--- !u!1 &5269189931577362882 stripped
GameObject:
m_CorrespondingSourceObject: {fileID: 2591864627249999519, guid: 41960beaa3d8041e19d82b5160042e55,
type: 3}
m_PrefabInstance: {fileID: 7703349395854010205}
m_PrefabAsset: {fileID: 0}

186
Project/Assets/ML-Agents/Examples/Walker/Scenes/WalkerStatic.unity


propertyPath: m_Name
value: WalkerPair (1)
objectReference: {fileID: 0}
- target: {fileID: 1707482909815496, guid: 94dced9d2186d4a76b970fb18ef6d7a6, type: 3}
propertyPath: m_IsActive
value: 1
objectReference: {fileID: 0}
- target: {fileID: 4878380427462518, guid: 94dced9d2186d4a76b970fb18ef6d7a6, type: 3}
propertyPath: m_LocalPosition.x
value: -500

propertyPath: m_RootOrder
value: 4
objectReference: {fileID: 0}
- target: {fileID: 5066517641317679859, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.x
value: -0.6999959
objectReference: {fileID: 0}
- target: {fileID: 5066517641317679859, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.y
value: 0
objectReference: {fileID: 0}
- target: {fileID: 5066517641317679859, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.z
value: -0.00000023841848
objectReference: {fileID: 0}
- target: {fileID: 5066517641327561583, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.x
value: -0.6999967
objectReference: {fileID: 0}
- target: {fileID: 5066517641327561583, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.y
value: 0
objectReference: {fileID: 0}
- target: {fileID: 5066517641327561583, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.z
value: -0.00000005960462
objectReference: {fileID: 0}
- target: {fileID: 5066517641430005247, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.x
value: 0.4999994
objectReference: {fileID: 0}
- target: {fileID: 5066517641430005247, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.y
value: 0
objectReference: {fileID: 0}
- target: {fileID: 5066517641430005247, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.z
value: 0.00000023841848
objectReference: {fileID: 0}
- target: {fileID: 5066517641468855679, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.x
value: 0.00000017881393
objectReference: {fileID: 0}
- target: {fileID: 5066517641468855679, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.y
value: -0.5
objectReference: {fileID: 0}
- target: {fileID: 5066517641468855679, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.z
value: -0.00000005960462
objectReference: {fileID: 0}
- target: {fileID: 5066517641503708862, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.x
value: -0.4999994
objectReference: {fileID: 0}
- target: {fileID: 5066517641503708862, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.y
value: 0
objectReference: {fileID: 0}
- target: {fileID: 5066517641503708862, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.z
value: -0.00000023841848
objectReference: {fileID: 0}
- target: {fileID: 5066517641529767720, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.x
value: -0.39999378
objectReference: {fileID: 0}
- target: {fileID: 5066517641529767720, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.y
value: -0.29999986
objectReference: {fileID: 0}
- target: {fileID: 5066517641529767720, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.z
value: -0.00000005960462
objectReference: {fileID: 0}
- target: {fileID: 5066517641893800742, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.x
value: -0.00000017881393
objectReference: {fileID: 0}
- target: {fileID: 5066517641893800742, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.z
value: 0.00000005960462
objectReference: {fileID: 0}
- target: {fileID: 5066517641985220144, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.x
value: 0.39999378
objectReference: {fileID: 0}
- target: {fileID: 5066517641985220144, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.y
value: -0.29999986
objectReference: {fileID: 0}
- target: {fileID: 5066517641985220144, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.z
value: 0.00000005960462
objectReference: {fileID: 0}
- target: {fileID: 5066517641988116231, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.x
value: -0.00000017881393
objectReference: {fileID: 0}
- target: {fileID: 5066517641988116231, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.y
value: -0.5
objectReference: {fileID: 0}
- target: {fileID: 5066517641988116231, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.z
value: 0.00000005960462
objectReference: {fileID: 0}
- target: {fileID: 5066517642825943758, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.x
value: 0.6999959
objectReference: {fileID: 0}
- target: {fileID: 5066517642825943758, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.y
- target: {fileID: 4878380427462518, guid: 94dced9d2186d4a76b970fb18ef6d7a6, type: 3}
propertyPath: m_LocalEulerAnglesHint.x
- target: {fileID: 5066517642825943758, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.z
value: 0.00000023841848
objectReference: {fileID: 0}
- target: {fileID: 5066517642925019576, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.y
value: 0.383
objectReference: {fileID: 0}
- target: {fileID: 5066517643083346557, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.x
value: 0.00000017881393
objectReference: {fileID: 0}
- target: {fileID: 5066517643083346557, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.z
value: -0.00000005960462
objectReference: {fileID: 0}
- target: {fileID: 5066517643117858766, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.y
value: 0.3050003
- target: {fileID: 4878380427462518, guid: 94dced9d2186d4a76b970fb18ef6d7a6, type: 3}
propertyPath: m_LocalEulerAnglesHint.y
value: 90.00001
- target: {fileID: 5066517643325467277, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.x
value: 0.6999967
objectReference: {fileID: 0}
- target: {fileID: 5066517643325467277, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.y
- target: {fileID: 4878380427462518, guid: 94dced9d2186d4a76b970fb18ef6d7a6, type: 3}
propertyPath: m_LocalEulerAnglesHint.z
objectReference: {fileID: 0}
- target: {fileID: 5066517643325467277, guid: 94dced9d2186d4a76b970fb18ef6d7a6,
type: 3}
propertyPath: m_ConnectedAnchor.z
value: 0.00000005960462
objectReference: {fileID: 0}
m_RemovedComponents: []
m_SourcePrefab: {fileID: 100100000, guid: 94dced9d2186d4a76b970fb18ef6d7a6, type: 3}

122
Project/Assets/ML-Agents/Examples/Walker/Scripts/WalkerAgent.cs


public class WalkerAgent : Agent
{
[Header("Walking Speed")]
[Space(10)]
[Header("Specific to Walker")]
[Space(10)]
[Header("Orientation Cube")]
[Space(10)]
//This will be used as a stable observation platform for the ragdoll to use.
GameObject m_OrientationCube;
[Header("Target To Walk Towards")]
[Space(10)]
public float targetSpawnRadius;
public Transform target;
public Transform ground;
public bool detectTargets;
public bool targetIsStatic;
public bool respawnTargetWhenTouched;
[Header("Target To Walk Towards")] [Space(10)]
public float targetSpawnRadius; //The radius in which a target can be randomly spawned.
public Transform target; //Target the agent will walk towards.
public Transform ground; //Ground gameobject. The height will be used for target spawning
public bool detectTargets; //Should this agent detect targets
public bool respawnTargetWhenTouched; //Should the target respawn to a different position when touched
[Header("Body Parts")]
[Space(10)]
public Transform hips;
[Header("Body Parts")] [Space(10)] public Transform hips;
public Transform chest;
public Transform spine;
public Transform head;

public Transform armR;
public Transform forearmR;
public Transform handR;
JointDriveController m_JdController;
[Header("Orientation")] [Space(10)]
//This will be used as a stable reference point for observations
//Because ragdolls can move erratically, using a standalone reference point can significantly improve learning
public GameObject orientationCube;
Rigidbody m_HipsRb;
Rigidbody m_ChestRb;
Rigidbody m_SpineRb;
JointDriveController m_JdController;
//Spawn an orientation cube
Vector3 oCubePos = hips.position;
oCubePos.y = -.45f;
m_OrientationCube = Instantiate(Resources.Load<GameObject>("OrientationCube"), oCubePos, Quaternion.identity);
m_OrientationCube.transform.SetParent(transform);
UpdateOrientationCube();
m_JdController = GetComponent<JointDriveController>();

m_JdController.SetupBodyPart(forearmR);
m_JdController.SetupBodyPart(handR);
m_HipsRb = hips.GetComponent<Rigidbody>();
m_ChestRb = chest.GetComponent<Rigidbody>();
m_SpineRb = spine.GetComponent<Rigidbody>();
m_ResetParams = Academy.Instance.EnvironmentParameters;
SetResetParameters();

{
//GROUND CHECK
sensor.AddObservation(bp.groundContact.touchingGround ? 1 : 0); // Is this bp touching the ground
sensor.AddObservation(m_OrientationCube.transform.InverseTransformDirection(bp.rb.velocity));
sensor.AddObservation(m_OrientationCube.transform.InverseTransformDirection(bp.rb.angularVelocity));
//Add pos of target relative to orientation cube
sensor.AddObservation(m_OrientationCube.transform.InverseTransformPoint(target.position));
// //Get position relative to hips in the context of our orientation cube's space
// sensor.AddObservation(m_OrientationCube.transform.InverseTransformDirection(bp.rb.position - hips.position));
sensor.AddObservation(orientationCube.transform.InverseTransformDirection(bp.rb.velocity));
sensor.AddObservation(orientationCube.transform.InverseTransformDirection(bp.rb.angularVelocity));
//Get position relative to hips in the context of our orientation cube's space
sensor.AddObservation(orientationCube.transform.InverseTransformDirection(bp.rb.position - hips.position));
if (bp.rb.transform != hips && bp.rb.transform != handL && bp.rb.transform != handR)
{

/// </summary>
public override void CollectObservations(VectorSensor sensor)
{
sensor.AddObservation(Quaternion.FromToRotation(hips.forward, m_OrientationCube.transform.forward));
sensor.AddObservation(Quaternion.FromToRotation(head.forward, m_OrientationCube.transform.forward));
sensor.AddObservation(m_OrientationCube.transform.InverseTransformPoint(target.position));
sensor.AddObservation(Quaternion.FromToRotation(hips.forward, orientationCube.transform.forward));
sensor.AddObservation(Quaternion.FromToRotation(head.forward, orientationCube.transform.forward));
sensor.AddObservation(orientationCube.transform.InverseTransformPoint(target.position));
foreach (var bodyPart in m_JdController.bodyPartsList)
{

void UpdateOrientationCube()
{
//FACING DIR
m_WalkDir = target.position - m_OrientationCube.transform.position;
m_WalkDir = target.position - orientationCube.transform.position;
m_OrientationCube.transform.position = hips.position;
m_OrientationCube.transform.rotation = m_WalkDirLookRot;
orientationCube.transform.position = hips.position;
orientationCube.transform.rotation = m_WalkDirLookRot;
void FixedUpdate()
{
if (detectTargets)

}
}
}
var moveTowardsTargetReward = Vector3.Dot(orientationCube.transform.forward,
Vector3.ClampMagnitude(m_JdController.bodyPartsDict[hips].rb.velocity, maximumWalkingSpeed));
var lookAtTargetReward = Vector3.Dot(orientationCube.transform.forward, head.forward);
var headHeightOverFeetReward = (head.position.y - footL.position.y) + (head.position.y - footR.position.y);
+0.02f * Vector3.Dot(m_OrientationCube.transform.forward,
Vector3.ClampMagnitude(m_JdController.bodyPartsDict[hips].rb.velocity, maximumWalkingSpeed))
+ 0.01f * Vector3.Dot(m_OrientationCube.transform.forward, head.forward)
+ 0.005f * (head.position.y - footL.position.y)
+ 0.005f * (head.position.y - footR.position.y)
+0.02f * moveTowardsTargetReward
+ 0.01f * lookAtTargetReward
+ 0.01f * headHeightOverFeetReward
);
}

AddReward(1f);
if (respawnTargetWhenTouched)
{
GetRandomTargetPos();
MoveTargetToRandomPosition();
}
}

public void GetRandomTargetPos()
public void MoveTargetToRandomPosition()
{
var newTargetPos = Random.insideUnitSphere * targetSpawnRadius;
newTargetPos.y = 5;

{
bodyPart.Reset(bodyPart);
}
if (detectTargets && !targetIsStatic)
if (detectTargets && respawnTargetWhenTouched)
GetRandomTargetPos();
MoveTargetToRandomPosition();
}
SetResetParameters();

{
m_ChestRb.mass = m_ResetParams.GetWithDefault("chest_mass", 8);
m_SpineRb.mass = m_ResetParams.GetWithDefault("spine_mass", 10);
m_HipsRb.mass = m_ResetParams.GetWithDefault("hip_mass", 15);
m_JdController.bodyPartsDict[chest].rb.mass = m_ResetParams.GetWithDefault("chest_mass", 8);
m_JdController.bodyPartsDict[spine].rb.mass = m_ResetParams.GetWithDefault("spine_mass", 8);
m_JdController.bodyPartsDict[hips].rb.mass = m_ResetParams.GetWithDefault("hip_mass", 8);
}
public void SetResetParameters()

private void OnDrawGizmosSelected()
{
if (Application.isPlaying)
{
{
Gizmos.matrix = m_OrientationCube.transform.localToWorldMatrix;
Gizmos.DrawWireCube(Vector3.zero, m_OrientationCube.transform.localScale);
Gizmos.matrix = orientationCube.transform.localToWorldMatrix;
Gizmos.DrawWireCube(Vector3.zero, orientationCube.transform.localScale);
Gizmos.DrawRay(Vector3.zero, Vector3.forward);
}
}

1001
Project/Assets/ML-Agents/Examples/Walker/TFModels/WalkerDynamic.nn
文件差异内容过多而无法显示
查看文件

2
Project/Assets/ML-Agents/Examples/Walker/TFModels/WalkerDynamic.nn.meta


fileFormatVersion: 2
guid: a4ab86f6c972e43d6bbb52cd0dfffe07
guid: 2cb15010f7cbe4dc59418a5858c87819
ScriptedImporter:
fileIDToRecycleName:
11400000: main obj

1001
Project/Assets/ML-Agents/Examples/Walker/TFModels/WalkerStatic.nn
文件差异内容过多而无法显示
查看文件

2
Project/Assets/ML-Agents/Examples/Walker/TFModels/WalkerStatic.nn.meta


fileFormatVersion: 2
guid: 9091f3caa96b043b1afcc573049ca578
guid: 1cb7f6cc571fb4376b972bc090627b6d
ScriptedImporter:
fileIDToRecycleName:
11400000: main obj

4
README.md


[contribution guidelines](com.unity.ml-agents/CONTRIBUTING.md) and
[code of conduct](CODE_OF_CONDUCT.md).
For problems with the installation and setup of the the ML-Agents Toolkit, or
For problems with the installation and setup of the ML-Agents Toolkit, or
using the ML-Agents Toolkit, or have a specific feature requests, please
using the ML-Agents Toolkit or have a specific feature request, please
[submit a GitHub issue](https://github.com/Unity-Technologies/ml-agents/issues).
Your opinion matters a great deal to us. Only by hearing your thoughts on the

34
com.unity.ml-agents/CHANGELOG.md


[Semantic Versioning](http://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Minor Changes
#### com.unity.ml-agents (C#)
#### ml-agents / ml-agents-envs / gym-unity (Python)
### Bug Fixes
#### com.unity.ml-agents (C#)
#### ml-agents / ml-agents-envs / gym-unity (Python)
## [1.1.0-preview] - 2020-06-10
### Major Changes
#### com.unity.ml-agents (C#)
#### ml-agents / ml-agents-envs / gym-unity (Python)
- Added new Walker environments. Improved ragdoll stability/performance. (#4037)
- `max_step` in the `TerminalStep` and `TerminalSteps` objects was renamed `interrupted`.
- `beta` and `epsilon` in `PPO` are no longer decayed by default but follow the same schedule as learning rate. (#3940)
- `get_behavior_names()` and `get_behavior_spec()` on UnityEnvironment were replaced by the `behavior_specs` property. (#3946)

vector observations to be used simultaneously. (#3981) Thank you @shakenes !
### Minor Changes
#### com.unity.ml-agents (C#)
- `ObservableAttribute` was added. Adding the attribute to fields or properties on an Agent will allow it to generate
observations via reflection. (#3925, #4006)
#### ml-agents / ml-agents-envs / gym-unity (Python)
- Curriculum and Parameter Randomization configurations have been merged
into the main training configuration file. Note that this means training
configuration files are now environment-specific. (#3791)

directory. (#3829)
- When using Curriculum, the current lesson will resume if training is quit and resumed. As such,
the `--lesson` CLI option has been removed. (#4025)
### Minor Changes
#### com.unity.ml-agents (C#)
- `ObservableAttribute` was added. Adding the attribute to fields or properties on an Agent will allow it to generate
observations via reflection. (#3925, #4006)
#### ml-agents / ml-agents-envs / gym-unity (Python)
- The `--save-freq` CLI option has been removed, and replaced by a `checkpoint_interval` option in the trainer configuration YAML. (#4034)
- When trying to load/resume from a checkpoint created with an earlier verison of ML-Agents,
a warning will be thrown. (#4035)
- Fixed an issue where SAC would perform too many model updates when resuming from a
checkpoint, and too few when using `buffer_init_steps`. (#4038)
- Fixed a bug in the onnx export that would cause constants needed for inference to not be visible to some versions of
the Barracuda importer. (#4073)
#### com.unity.ml-agents (C#)
#### ml-agents / ml-agents-envs / gym-unity (Python)

10
com.unity.ml-agents/Runtime/Academy.cs


/// Unity package version of com.unity.ml-agents.
/// This must match the version string in package.json and is checked in a unit test.
/// </summary>
internal const string k_PackageVersion = "1.0.2-preview";
internal const string k_PackageVersion = "1.1.0-preview";
const int k_EditorTrainingPort = 5004;

port = port
}
);
Communicator.QuitCommandReceived += OnQuitCommandReceived;
Communicator.ResetCommandReceived += OnResetCommand;
}
if (Communicator != null)

"Will perform inference instead."
);
Communicator = null;
}
if (Communicator != null)
{
Communicator.QuitCommandReceived += OnQuitCommandReceived;
Communicator.ResetCommandReceived += OnResetCommand;
}
}

8
com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs


m_Client = new UnityToExternalProto.UnityToExternalProtoClient(channel);
var result = m_Client.Exchange(WrapMessage(unityOutput, 200));
unityInput = m_Client.Exchange(WrapMessage(null, 200)).UnityInput;
var inputMessage = m_Client.Exchange(WrapMessage(null, 200));
unityInput = inputMessage.UnityInput;
if (result.Header.Status != 200 || inputMessage.Header.Status != 200)
{
m_IsOpen = false;
QuitCommandReceived?.Invoke();
}
return result.UnityInput;
#else
throw new UnityAgentsException(

12
com.unity.ml-agents/Runtime/Policies/BehaviorParameters.cs


public enum ObservableAttributeOptions
{
/// <summary>
/// All ObservableAttributes on the Agent will be ignored. If there are no
/// ObservableAttributes on the Agent, this will result in the fastest
/// initialization time.
/// All ObservableAttributes on the Agent will be ignored. This is the
/// default behavior. If there are no ObservableAttributes on the
/// Agent, this will result in the fastest initialization time.
/// inherited are ignored. This is the default behavior, and a reasonable
/// tradeoff between performance and flexibility.
/// inherited are ignored. This is a reasonable tradeoff between
/// performance and flexibility.
/// </summary>
/// <remarks>This corresponds to setting the
/// [BindingFlags.DeclaredOnly](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.bindingflags?view=netcore-3.1)

/// <summary>
/// All members on the class will be examined. This can lead to slower
/// startup times
/// startup times.
/// </summary>
ExamineAll
}

4
com.unity.ml-agents/Runtime/Sensors/Reflection/EnumReflectionSensor.cs


namespace Unity.MLAgents.Sensors.Reflection
{
internal class EnumReflectionSensor: ReflectionSensorBase
internal class EnumReflectionSensor : ReflectionSensorBase
{
Array m_Values;
bool m_IsFlags;

var enumValue = (Enum)GetReflectedValue();
int i = 0;
foreach(var val in m_Values)
foreach (var val in m_Values)
{
if (m_IsFlags)
{

2
com.unity.ml-agents/package.json


{
"name": "com.unity.ml-agents",
"displayName": "ML Agents",
"version": "1.0.2-preview",
"version": "1.1.0-preview",
"unity": "2018.4",
"description": "Use state-of-the-art machine learning to create intelligent character behaviors in any Unity environment (games, robotics, film, etc.).",
"dependencies": {

2
config/ppo/SoccerTwos.yaml


self_play:
save_steps: 50000
team_change: 200000
swap_steps: 50000
swap_steps: 2000
window: 10
play_against_latest_model_ratio: 0.5
initial_elo: 1200.0

4
config/ppo/StrikersVsGoalie.yaml


self_play:
save_steps: 50000
team_change: 200000
swap_steps: 25000
swap_steps: 1000
window: 10
play_against_latest_model_ratio: 0.5
initial_elo: 1200.0

self_play:
save_steps: 50000
team_change: 200000
swap_steps: 100000
swap_steps: 4000
window: 10
play_against_latest_model_ratio: 0.5
initial_elo: 1200.0

6
config/ppo/Tennis.yaml


Tennis:
trainer_type: ppo
hyperparameters:
batch_size: 1024
buffer_size: 10240
batch_size: 2048
buffer_size: 20480
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2

self_play:
save_steps: 50000
team_change: 100000
swap_steps: 50000
swap_steps: 2000
window: 10
play_against_latest_model_ratio: 0.5
initial_elo: 1200.0

8
config/sac/WallJump.yaml


learning_rate: 0.0003
learning_rate_schedule: constant
batch_size: 128
buffer_size: 50000
buffer_size: 200000
steps_per_update: 10.0
steps_per_update: 20.0
save_replay_buffer: false
init_entcoef: 0.1
reward_signal_steps_per_update: 10.0

strength: 1.0
output_path: default
keep_checkpoints: 5
max_steps: 20000000
max_steps: 15000000
time_horizon: 128
summary_freq: 20000
threaded: true

buffer_size: 50000
buffer_init_steps: 0
tau: 0.005
steps_per_update: 10.0
steps_per_update: 20.0
save_replay_buffer: false
init_entcoef: 0.1
reward_signal_steps_per_update: 10.0

12
config/sac/WalkerDynamic.yaml


behaviors:
Walker:
WalkerDynamic:
batch_size: 256
buffer_size: 500000
batch_size: 1024
buffer_size: 2000000
buffer_init_steps: 0
tau: 0.005
steps_per_update: 30.0

network_settings:
normalize: true
hidden_units: 512
num_layers: 4
hidden_units: 256
num_layers: 3
vis_encode_type: simple
reward_signals:
extrinsic:

keep_checkpoints: 5
max_steps: 20000000
max_steps: 15000000
time_horizon: 1000
summary_freq: 30000
threaded: true

2
docs/Background-Machine-Learning.md


water hose and whether the hose is on or off).
The last remaining piece of the reinforcement learning task is the **reward
signal**. When training a robot to be a mean firefighting machine, we provide it
signal**. The robot is trained to learn a policy that maximizes its overall rewards. When training a robot to be a mean firefighting machine, we provide it
with rewards (positive and negative) indicating how well it is doing on
completing the task. Note that the robot does not _know_ how to put out fires
before it is trained. It learns the objective because it receives a large

14
docs/FAQ.md


search the tensorflow github issues for similar problems and solutions before
creating a new issue.
#### Visual C++ Dependency (Windows Users)
When running `mlagents-learn`, if you see a stack trace with a message like this:
```console
ImportError: DLL load failed: The specified module could not be found.
```
then either of the required DLLs, `msvcp140.dll` (old) or `msvcp140_1.dll` (new), are missing on your machine. The `import tensorflow` command will print this warning message.
To solve it, download and install (with a reboot) the install [Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019](https://support.microsoft.com/en-my/help/2977003/the-latest-supported-visual-c-downloads).
For more details, please see the [TensorFlow 2.1.0 release notes](https://github.com/tensorflow/tensorflow/releases/tag/v2.1.0)
and the [TensorFlow github issue](https://github.com/tensorflow/tensorflow/issues/22794#issuecomment-573297027).
## Environment Permission Error
If you directly import your Unity environment without building it in the editor,

6
docs/Installation.md


- Install Unity (2018.4 or later)
- Install Python (3.6.1 or higher)
- Clone this repository (Optional)
- __Note:__ If you do not clone the repository, then you will not be
able to access the example environments and training configurations.
Additionally, the [Getting Started Guide](Getting-Started.md) assumes that
you have cloned the repository.
- Install the `com.unity.ml-agents` Unity package
- Install the `mlagents` Python package

order to find it.
**NOTE:** If you do not see the ML-Agents package listed in the Package Manager
please follow the the [advanced installation instructions](#advanced-local-installation-for-development) below.
please follow the [advanced installation instructions](#advanced-local-installation-for-development) below.
#### Advanced: Local Installation for Development

38
docs/Learning-Environment-Create-New.md


```yml
behaviors:
RollerBall:
trainer: ppo
batch_size: 10
beta: 5.0e-3
buffer_size: 100
epsilon: 0.2
hidden_units: 128
lambd: 0.95
learning_rate: 3.0e-4
learning_rate_schedule: linear
max_steps: 5.0e4
memory_size: 128
normalize: false
num_epoch: 3
num_layers: 2
trainer_type: ppo
hyperparameters:
batch_size: 10
buffer_size: 100
learning_rate: 3.0e-4
beta: 5.0e-4
epsilon: 0.2
lambd: 0.99
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: false
hidden_units: 128
num_layers: 2
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
max_steps: 500000
use_recurrent: false
reward_signals:
extrinsic:
strength: 1.0
gamma: 0.99
```
Since this example creates a very simple training environment with only a few

274
docs/Learning-Environment-Design-Agents.md


- [Decisions](#decisions)
- [Observations and Sensors](#observations-and-sensors)
- [Generating Observations](#generating-observations)
- [Agent.CollectObservations()](#agentcollectobservations)
- [Observable Fields and Properties](#observable-fields-and-properties)
- [ISensor interface and SensorComponents](#isensor-interface-and-sensorcomponents)
- [Stacking](#stacking)
- [Vector Observation Summary & Best Practices](#vector-observation-summary--best-practices)
- [Visual Observations](#visual-observations)
- [Visual Observation Summary & Best Practices](#visual-observation-summary--best-practices)

write your own Policy. If the Agent has a `Model` file, its Policy will use the
neural network `Model` to take decisions.
When you create an Agent, you must extend the base Agent class. This includes
implementing the following methods:
When you create an Agent, you should usually extend the base Agent class. This
includes implementing the following methods:
including at the beginning of the simulation. The Ball3DAgent class uses this
function to reset the agent cube and ball to their starting positions. The
function randomizes the reset values so that the training generalizes to more
than a specific starting position and agent cube attitude.
- `Agent.CollectObservations(VectorSensor sensor)` — Called every simulation
step. Responsible for collecting the Agent's observations of the environment.
Since the Behavior Parameters of the Agent are set with vector observation
space with a state size of 8, the `CollectObservations(VectorSensor sensor)`
must call `VectorSensor.AddObservation()` such that vector size adds up to 8.
including at the beginning of the simulation.
- `Agent.CollectObservations(VectorSensor sensor)` — Called every step that the Agent
requests a decision. This is one possible way for collecting the Agent's
observations of the environment; see [Generating Observations](#generating-observations)
below for more options.
take. Receives the action chosen by the Agent. The vector action spaces result
in a small change in the agent cube's rotation at each step. The
`OnActionReceived()` method assigns a reward to the Agent; in this example, an
Agent receives a small positive reward for each step it keeps the ball on the
agent cube's head and a larger, negative reward for dropping the ball. An
Agent's episode is also ended when it drops the ball so that it will reset
with a new ball for the next simulation step.
take. Receives the action chosen by the Agent. It is also common to assign a
reward in this method.
returns an array of floats. In the case of the Ball 3D Agent, the
`Heuristic()` method converts the keyboard inputs into actions.
writes to a provided array of floats.
As a concrete example, here is how the Ball3DAgent class implements these methods:
- `Agent.OnEpisodeBegin()` — Resets the agent cube and ball to their starting
positions. The function randomizes the reset values so that the training
generalizes to more than a specific starting position and agent cube
orientation.
- `Agent.CollectObservations(VectorSensor sensor)` — Adds information about the
orientation of the agent cube, the ball velocity, and the relative position
between the ball and the cube. Since the `CollectObservations()`
method calls `VectorSensor.AddObservation()` such that vector size adds up to 8,
the Behavior Parameters of the Agent are set with vector observation space
with a state size of 8.
- `Agent.OnActionReceived()` — The vector action spaces result
in a small change in the agent cube's rotation at each step. In this example,
an Agent receives a small positive reward for each step it keeps the ball on the
agent cube's head and a larger, negative reward for dropping the ball. An
Agent's episode is also ended when it drops the ball so that it will reset
with a new ball for the next simulation step.
- `Agent.Heuristic()` - Converts the keyboard inputs into actions.
## Decisions

should call `Agent.RequestDecision()` manually.
## Observations and Sensors
To make informed decisions, an agent must first make observations of the state
of the environment. The observations are collected by Sensors attached to the
agent GameObject. By default, agents come with a `VectorSensor` which allows
them to collect floating-point observations into a single array. There are
additional sensor components which can be attached to the agent GameObject which
collect their own observations, or modify other observations. These are:
- `CameraSensorComponent` - Allows image from `Camera` to be used as
observation.
- `RenderTextureSensorComponent` - Allows content of `RenderTexture` to be used
as observation.
- `RayPerceptionSensorComponent` - Allows information from set of ray-casts to
be used as observation.
In order for an agent to learn, the observations should include all the
information an agent needs to accomplish its task. Without sufficient and
relevant information, an agent may learn poorly or may not learn at all. A
reasonable approach for determining what information should be included is to
consider what you would need to calculate an analytical solution to the problem,
or what you would expect a human to be able to use to solve the problem.
### Vector Observations
### Generating Observations
ML-Agents provides multiple ways for an Agent to make observations:
1. Overriding the `Agent.CollectObservations()` method and passing the
observations to the provided `VectorSensor`.
1. Adding the `[Observable]` attribute to fields and properties on the Agent.
1. Implementing the `ISensor` interface, using a `SensorComponent` attached to
the Agent to create the `ISensor`.
Vector observations are best used for aspects of the environment which are
#### Agent.CollectObservations()
Agent.CollectObservations() is best used for aspects of the environment which are
In order for an agent to learn, the observations should include all the
information an agent needs to accomplish its task. Without sufficient and
relevant information, an agent may learn poorly or may not learn at all. A
reasonable approach for determining what information should be included is to
consider what you would need to calculate an analytical solution to the problem,
or what you would expect a human to be able to use to solve the problem.
The `VectorSensor.AddObservation` method provides a number of overloads for
adding common types of data to your observation vector. You can add Integers and
booleans directly to the observation vector, as well as some common Unity data
types such as `Vector2`, `Vector3`, and `Quaternion`.
state observation. As an experiment, you can remove the velocity components from
the observation and retrain the 3DBall agent. While it will learn to balance the
ball reasonably well, the performance of the agent without using velocity is
noticeably worse.
state observation.
private List<float> state = new List<float>();
// Orientation of the cube (2 floats)
sensor.AddObservation((ball.transform.position.x - gameObject.transform.position.x));
sensor.AddObservation((ball.transform.position.y - gameObject.transform.position.y));
sensor.AddObservation((ball.transform.position.z - gameObject.transform.position.z));
sensor.AddObservation(ball.transform.GetComponent<Rigidbody>().velocity.x);
sensor.AddObservation(ball.transform.GetComponent<Rigidbody>().velocity.y);
sensor.AddObservation(ball.transform.GetComponent<Rigidbody>().velocity.z);
// Relative position of the ball to the cube (3 floats)
sensor.AddObservation(ball.transform.position - gameObject.transform.position);
// Velocity of the ball (3 floats)
sensor.AddObservation(m_BallRb.velocity);
// 8 floats total
The feature vector must always contain the same number of elements and
observations must always be in the same position within the list. If the number
of observed entities in an environment can vary you can pad the feature vector
with zeros for any missing entities in a specific observation or you can limit
As an experiment, you can remove the velocity components from
the observation and retrain the 3DBall agent. While it will learn to balance the
ball reasonably well, the performance of the agent without using velocity is
noticeably worse.
The observations passed to `VectorSensor.AddObservation()` must always contain
the same number of elements must always be in the same order. If the number
of observed entities in an environment can vary, you can pad the calls
with zeros for any missing entities in a specific observation, or you can limit
every enemy agent in an environment, you could only observe the closest five.
every enemy in an environment, you could only observe the closest five.
When you set up an Agent's `Behavior Parameters` in the Unity Editor, set the
following properties to use a vector observation:
Additionally, when you set up an Agent's `Behavior Parameters` in the Unity
Editor, you must set the **Vector Observations > Space Size**
to equal the number of floats that are written by `CollectObservations()`.
- **Space Size** — The state size must match the length of your feature vector.
#### Observable Fields and Properties
Another approach is to define the relevant observations as fields or properties
on your Agent class, and annotate them with an `ObservableAttribute`. For
example, in the 3DBall example above, the rigid body velocity could be observed
by adding a property to the Agent:
```csharp
using Unity.MLAgents.Sensors.Reflection;
The observation feature vector is a list of floating point numbers, which means
you must convert any other data types to a float or a list of floats.
public class Ball3DAgent : Agent {
The `VectorSensor.AddObservation` method provides a number of overloads for
adding common types of data to your observation vector. You can add Integers and
booleans directly to the observation vector, as well as some common Unity data
types such as `Vector2`, `Vector3`, and `Quaternion`.
[Observable]
public Vector3 RigidBodyVelocity
{
get { return m_BallRb.velocity; }
}
}
```
`ObservableAttribute` currently supports most basic types (e.g. floats, ints,
bools), as well as `Vector2`, `Vector3`, `Vector4`, `Quaternion`, and enums.
The behavior of `ObservableAttribute`s are controlled by the "Observable Attribute
Handling" in the Agent's `Behavior Parameters`. The possible values for this are:
* **Ignore** (default) - All ObservableAttributes on the Agent will be ignored.
If there are no ObservableAttributes on the Agent, this will result in the
fastest initialization time.
* **Exclude Inherited** - Only members on the declared class will be examined;
members that are inherited are ignored. This is a reasonable tradeoff between
performance and flexibility.
* **Examine All** All members on the class will be examined. This can lead to
slower startup times.
"Exclude Inherited" is generally sufficient, but if your Agent inherits from
another Agent implementation that has Observable members, you will need to use
"Examine All".
Internally, ObservableAttribute uses reflection to determine which members of
the Agent have ObservableAttributes, and also uses reflection to access the
fields or invoke the properties at runtime. This may be slower than using
CollectObservations or an ISensor, although this might not be enough to
noticeably affect performance.
**NOTE**: you do not need to adjust the Space Size in the Agent's
`Behavior Parameters` when you add `[Observable]` fields or properties to an
Agent, since their size can be computed before they are used.
#### ISensor interface and SensorComponents
The `ISensor` interface is generally intended for advanced users. The `Write()`
method is used to actually generate the observation, but some other methods
such as returning the shape of the observations must also be implemented.
The `SensorComponent` abstract class is used to create the actual `ISensor` at
runtime. It must be attached to the same `GameObject` as the `Agent`, or to a
child `GameObject`.
There are several SensorComponents provided in the API:
- `CameraSensorComponent` - Allows image from `Camera` to be used as
observation.
- `RenderTextureSensorComponent` - Allows content of `RenderTexture` to be used
as observation.
- `RayPerceptionSensorComponent` - Allows information from set of ray-casts to
be used as observation.
**NOTE**: you do not need to adjust the Space Size in the Agent's
`Behavior Parameters` when using an ISensor SensorComponents.
Internally, both `Agent.CollectObservations` and `[Observable]` attribute use an
ISensors to write observations, although this is mostly abstracted from the user.
### Vector Observations
Both `Agent.CollectObservations()` and `ObservableAttribute`s produce vector
observations, which are represented at lists of `float`s. `ISensor`s can
produce both vector observations and visual observations, which are
multi-dimensional arrays of floats.
Below are some additional considerations when dealing with vector observations:
#### One-hot encoding categorical information

the feature vector. The following code example illustrates how to add.
```csharp
enum CarriedItems { Sword, Shield, Bow, LastItem }
private List<float> state = new List<float>();
enum ItemType { Sword, Shield, Bow, LastItem }
for (int ci = 0; ci < (int)CarriedItems.LastItem; ci++)
for (int ci = 0; ci < (int)ItemType.LastItem; ci++)
{
sensor.AddObservation((int)currentItem == ci ? 1.0f : 0.0f);
}

to the previous one.
```csharp
enum CarriedItems { Sword, Shield, Bow, LastItem }
const int NUM_ITEM_TYPES = (int)CarriedItems.LastItem;
enum ItemType { Sword, Shield, Bow, LastItem }
const int NUM_ITEM_TYPES = (int)ItemType.LastItem;
public override void CollectObservations(VectorSensor sensor)
{

}
```
`ObservableAttribute` has built-in support for enums. Note that you don't need
the `LastItem` placeholder in this case:
```csharp
enum ItemType { Sword, Shield, Bow }
public class HeroAgent : Agent
{
[Observable]
ItemType m_CurrentItem;
}
```

angle, or, if the number of turns is significant, increase the maximum value
used in your normalization formula.
#### Stacking
Stacking refers to repeating observations from previous steps as part of a
larger observation. For example, consider an Agent that generates these
observations in four steps
```
step 1: [0.1]
step 2: [0.2]
step 3: [0.3]
step 4: [0.4]
```
If we use a stack size of 3, the observations would instead be:
```csharp
step 1: [0.1, 0.0, 0.0]
step 2: [0.2, 0.1, 0.0]
step 3: [0.3, 0.2, 0.1]
step 4: [0.4, 0.3, 0.2]
```
(The observations are padded with zeroes for the first `stackSize-1` steps).
This is a simple way to give an Agent limited "memory" without the complexity
of adding a recurrent neural network (RNN).
The steps for enabling stacking depends on how you generate observations:
* For Agent.CollectObservations(), set "Stacked Vectors" on the Agent's
`Behavior Parameters` to a value greater than 1.
* For ObservableAttribute, set the `numStackedObservations` parameter in the
constructor, e.g. `[Observable(numStackedObservations: 2)]`.
* For `ISensor`s, wrap them in a `StackingSensor` (which is also an `ISensor`).
Generally, this should happen in the `CreateSensor()` method of your
`SensorComponent`.
Note that stacking currently only supports for vector observations; stacking
for visual observations is not supported.
#### Vector Observation Summary & Best Practices
- Vector Observations should include all variables relevant for allowing the

value in the agent GameObject's `Behavior Parameters` should be changed.
- Categorical variables such as type of object (Sword, Shield, Bow) should be
encoded in one-hot fashion (i.e. `3` -> `0, 0, 1`). This can be done
automatically using the `AddOneHotObservation()` method of the `VectorSensor`.
automatically using the `AddOneHotObservation()` method of the `VectorSensor`,
or using `[Observable]` on an enum field or property of the Agent.
- In general, all inputs should be normalized to be in the range 0 to +1 (or -1
to 1). For example, the `x` position information of an agent where the maximum
possible value is `maxValue` should be recorded as

Mathf.Abs(gameObject.transform.position.x - area.transform.position.x) > 8f ||
Mathf.Abs(gameObject.transform.position.z + 5 - area.transform.position.z) > 8)
{
EndEpisode();
EndEpisode();
}
```

10
docs/Learning-Environment-Design.md


1. Calls your Academy's `OnEnvironmentReset` delegate.
1. Calls the `OnEpisodeBegin()` function for each Agent in the scene.
1. Calls the `CollectObservations(VectorSensor sensor)` function for each Agent
in the scene.
1. Gathers information about the scene. This is done by calling the
`CollectObservations(VectorSensor sensor)` function for each Agent in the
scene, as well as updating their sensor and collecting the resulting
observations.
1. Uses each Agent's Policy to decide on the Agent's next action.
1. Calls the `OnActionReceived()` function for each Agent in the scene, passing
in the action chosen by the Agent's Policy.

in a football game or a car object in a vehicle simulation. Every Agent must
have appropriate `Behavior Parameters`.
To create an Agent, extend the Agent class and implement the essential
`CollectObservations(VectorSensor sensor)` and `OnActionReceived()` methods:
Generally, when creating an Agent, you should extend the Agent class and implement
the `CollectObservations(VectorSensor sensor)` and `OnActionReceived()` methods:
- `CollectObservations(VectorSensor sensor)` — Collects the Agent's observation
of its environment.

16
docs/Migrating.md


- `use_visual` and `allow_multiple_visual_obs` in the `UnityToGymWrapper` constructor
were replaced by `allow_multiple_obs` which allows one or more visual observations and
vector observations to be used simultaneously.
- `--save-freq` has been removed from the CLI and is now configurable in the trainer configuration
file.
- `--lesson` has been removed from the CLI. Lessons will resume when using `--resume`.
To start at a different lesson, modify your Curriculum configuration.
- To upgrade your configuration files, an upgrade script has been provided. Run `python config/update_config.py
-h` to see the script usage.
- To upgrade your configuration files, an upgrade script has been provided. Run
`python -m mlagents.trainers.upgrade_config -h` to see the script usage. Note that you will have
had to upgrade to/install the current version of ML-Agents before running the script.
- If your training uses [curriculum](Training-ML-Agents.md#curriculum-learning), move those configurations under
the `Behavior Name` section.
- If your training uses [curriculum](Training-ML-Agents.md#curriculum-learning), move those configurations under a `curriculum` section.
- If your training uses [parameter randomization](Training-ML-Agents.md#environment-parameter-randomization), move
the contents of the sampler config to `parameter_randomization` in the main trainer configuration.
- If you are using `UnityEnvironment` directly, replace `max_step` with `interrupted`

from the constructor and add `allow_multiple_obs = True` if the environment contains either
both visual and vector observations or multiple visual observations.
- If you were setting `--save-freq` in the CLI, add a `checkpoint_interval` value in your
trainer configuration, and set it equal to `save-freq * n_agents_in_scene`.
## Migrating from 0.15 to Release 1

`RayPerception3d.Perceive()` that was causing the `endOffset` to be used
incorrectly. However this may produce different behavior from previous
versions if you use a non-zero `startOffset`. To reproduce the old behavior,
you should increase the the value of `endOffset` by `startOffset`. You can
you should increase the value of `endOffset` by `startOffset`. You can
verify your raycasts are performing as expected in scene view using the debug
rays.
- If you use RayPerception3D, replace it with RayPerceptionSensorComponent3D

3
docs/Training-Configuration-File.md


| `summary_freq` | (default = `50000`) Number of experiences that needs to be collected before generating and displaying training statistics. This determines the granularity of the graphs in Tensorboard. |
| `time_horizon` | (default = `64`) How many steps of experience to collect per-agent before adding it to the experience buffer. When this limit is reached before the end of an episode, a value estimate is used to predict the overall expected reward from the agent's current state. As such, this parameter trades off between a less biased, but higher variance estimate (long time horizon) and more biased, but less varied estimate (short time horizon). In cases where there are frequent rewards within an episode, or episodes are prohibitively large, a smaller number can be more ideal. This number should be large enough to capture all the important behavior within a sequence of an agent's actions. <br><br> Typical range: `32` - `2048` |
| `max_steps` | (default = `500000`) Total number of steps (i.e., observation collected and action taken) that must be taken in the environment (or across all environments if using multiple in parallel) before ending the training process. If you have multiple agents with the same behavior name within your environment, all steps taken by those agents will contribute to the same `max_steps` count. <b