浏览代码

Merge remote-tracking branch 'origin/master' into develop-add-fire

/develop/add-fire
Arthur Juliani 5 年前
当前提交
c577ce26
共有 83 个文件被更改,包括 12547 次插入5713 次删除
  1. 2
      Project/Assets/ML-Agents/Examples/3DBall/Demos/Expert3DBall.demo.meta
  2. 2
      Project/Assets/ML-Agents/Examples/3DBall/Demos/Expert3DBallHard.demo.meta
  3. 2
      Project/Assets/ML-Agents/Examples/Basic/Demos/ExpertBasic.demo.meta
  4. 2
      Project/Assets/ML-Agents/Examples/Bouncer/Demos/ExpertBouncer.demo.meta
  5. 2
      Project/Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerDyn.demo.meta
  6. 2
      Project/Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerSta.demo.meta
  7. 2
      Project/Assets/ML-Agents/Examples/FoodCollector/Demos/ExpertFood.demo.meta
  8. 2
      Project/Assets/ML-Agents/Examples/GridWorld/Demos/ExpertGrid.demo.meta
  9. 2
      Project/Assets/ML-Agents/Examples/Hallway/Demos/ExpertHallway.demo.meta
  10. 2
      Project/Assets/ML-Agents/Examples/PushBlock/Demos/ExpertPush.demo.meta
  11. 2
      Project/Assets/ML-Agents/Examples/Pyramids/Demos/ExpertPyramid.demo.meta
  12. 2
      Project/Assets/ML-Agents/Examples/Reacher/Demos/ExpertReacher.demo.meta
  13. 72
      Project/Assets/ML-Agents/Examples/Soccer/Prefabs/SoccerFieldTwos.prefab
  14. 71
      Project/Assets/ML-Agents/Examples/Soccer/Scripts/AgentSoccer.cs
  15. 12
      Project/Assets/ML-Agents/Examples/Soccer/Scripts/SoccerFieldArea.cs
  16. 1001
      Project/Assets/ML-Agents/Examples/Soccer/TFModels/SoccerTwos.nn
  17. 2
      Project/Assets/ML-Agents/Examples/Soccer/TFModels/Goalie.nn.meta
  18. 2
      Project/Assets/ML-Agents/Examples/Tennis/Demos/ExpertTennis.demo.meta
  19. 2
      Project/Assets/ML-Agents/Examples/Walker/Demos/ExpertWalker.demo.meta
  20. 2
      Project/ProjectSettings/ProjectVersion.txt
  21. 307
      com.unity.ml-agents/CHANGELOG.md
  22. 78
      com.unity.ml-agents/Editor/DemonstrationDrawer.cs
  23. 26
      com.unity.ml-agents/Editor/DemonstrationImporter.cs
  24. 21
      com.unity.ml-agents/Runtime/Agent.cs
  25. 68
      com.unity.ml-agents/Runtime/Communicator/GrpcExtensions.cs
  26. 2
      com.unity.ml-agents/Runtime/Demonstrations/DemonstrationWriter.cs
  27. 17
      com.unity.ml-agents/Runtime/Policies/HeuristicPolicy.cs
  28. 35
      com.unity.ml-agents/Runtime/Timer.cs
  29. 43
      com.unity.ml-agents/Tests/Editor/TimerTest.cs
  30. 36
      config/trainer_config.yaml
  31. 92
      docs/FAQ.md
  32. 346
      docs/Getting-Started.md
  33. 120
      docs/Installation-Anaconda-Windows.md
  34. 489
      docs/Learning-Environment-Create-New.md
  35. 508
      docs/Learning-Environment-Design-Agents.md
  36. 674
      docs/Learning-Environment-Examples.md
  37. 84
      docs/Learning-Environment-Executable.md
  38. 39
      docs/ML-Agents-Overview.md
  39. 100
      docs/Readme.md
  40. 6
      docs/Training-Imitation-Learning.md
  41. 368
      docs/Training-ML-Agents.md
  42. 24
      docs/Training-Self-Play.md
  43. 147
      docs/Training-on-Amazon-Web-Service.md
  44. 145
      docs/Training-on-Microsoft-Azure.md
  45. 36
      docs/Using-Docker.md
  46. 82
      docs/Using-Tensorboard.md
  47. 66
      docs/Using-Virtual-Environment.md
  48. 257
      docs/images/demo_inspector.png
  49. 999
      docs/images/docker_build_settings.png
  50. 198
      docs/images/learning_environment_basic.png
  51. 545
      docs/images/learning_environment_example.png
  52. 604
      docs/images/unity_package_json.png
  53. 999
      docs/images/unity_package_manager_window.png
  54. 95
      ml-agents/mlagents/trainers/learn.py
  55. 24
      utils/validate_versions.py
  56. 1001
      Project/Assets/ML-Agents/Examples/Soccer/Prefabs/StrikersVsGoalieField.prefab
  57. 8
      Project/Assets/ML-Agents/Examples/Soccer/Prefabs/StrikersVsGoalieField.prefab.meta
  58. 919
      Project/Assets/ML-Agents/Examples/Soccer/Scenes/StrikersVsGoalie.unity
  59. 8
      Project/Assets/ML-Agents/Examples/Soccer/Scenes/StrikersVsGoalie.unity.meta
  60. 1001
      Project/Assets/ML-Agents/Examples/Soccer/TFModels/Goalie.nn
  61. 11
      Project/Assets/ML-Agents/Examples/Soccer/TFModels/SoccerTwos.nn.meta
  62. 1001
      Project/Assets/ML-Agents/Examples/Soccer/TFModels/Striker.nn
  63. 11
      Project/Assets/ML-Agents/Examples/Soccer/TFModels/Striker.nn.meta
  64. 22
      com.unity.ml-agents/Runtime/Demonstrations/DemonstrationMetaData.cs
  65. 11
      com.unity.ml-agents/Runtime/Demonstrations/DemonstrationMetaData.cs.meta
  66. 37
      com.unity.ml-agents/Runtime/Demonstrations/DemonstrationSummary.cs
  67. 7
      config/curricula/soccer.yaml
  68. 122
      docs/images/learning_environment_full.png
  69. 1001
      docs/images/roller-ball-agent.png
  70. 932
      docs/images/roller-ball-floor.png
  71. 115
      docs/images/roller-ball-hierarchy.png
  72. 163
      docs/images/roller-ball-projects.png
  73. 803
      docs/images/roller-ball-target.png
  74. 938
      docs/images/strikersvsgoalie.png
  75. 38
      com.unity.ml-agents/Runtime/Demonstrations/Demonstration.cs
  76. 86
      docs/images/mlagents-NewProject.png
  77. 388
      docs/images/mlagents-NewTutBlock.png
  78. 345
      docs/images/mlagents-NewTutFloor.png
  79. 333
      docs/images/mlagents-NewTutSphere.png
  80. 91
      docs/Training-on-Microsoft-Azure-Custom-Instance.md
  81. 0
      /Project/Assets/ML-Agents/Examples/Soccer/TFModels/SoccerTwos.nn
  82. 0
      /Project/Assets/ML-Agents/Examples/Soccer/TFModels/Goalie.nn.meta
  83. 0
      /com.unity.ml-agents/Runtime/Demonstrations/DemonstrationSummary.cs.meta

2
Project/Assets/ML-Agents/Examples/3DBall/Demos/Expert3DBall.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/3DBall/Demos/Expert3DBall.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/3DBall/Demos/Expert3DBallHard.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/3DBall/Demos/Expert3DBallHard.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/Basic/Demos/ExpertBasic.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Basic/Demos/ExpertBasic.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/Bouncer/Demos/ExpertBouncer.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Bouncer/Demos/ExpertBouncer.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerDyn.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerDyn.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerSta.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerSta.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/FoodCollector/Demos/ExpertFood.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/FoodCollector/Demos/ExpertFood.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/GridWorld/Demos/ExpertGrid.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/GridWorld/Demos/ExpertGrid.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/Hallway/Demos/ExpertHallway.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Hallway/Demos/ExpertHallway.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/PushBlock/Demos/ExpertPush.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/PushBlock/Demos/ExpertPush.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/Pyramids/Demos/ExpertPyramid.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Pyramids/Demos/ExpertPyramid.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/Reacher/Demos/ExpertReacher.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Reacher/Demos/ExpertReacher.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

72
Project/Assets/ML-Agents/Examples/Soccer/Prefabs/SoccerFieldTwos.prefab


- component: {fileID: 114492261207303438}
- component: {fileID: 114320493772006642}
- component: {fileID: 9152743230243588598}
- component: {fileID: 5530675298926254831}
m_Layer: 0
m_Name: PurpleStriker
m_TagString: purpleAgent

vectorActionSize: 030000000300000003000000
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 11400000, guid: 9d26b71f04a2d4680a68d8de4f6b62e9, type: 3}
m_Model: {fileID: 11400000, guid: b0a629580a0ab48a5a774f90ff1fb48b, type: 3}
m_BehaviorName: Soccer
m_BehaviorName: SoccerTwos
TeamId: 1
m_UseChildSensors: 1
--- !u!114 &114492261207303438

maxStep: 3000
team: 0
area: {fileID: 114559182131992928}
position: 2
agentRb: {fileID: 0}
--- !u!114 &114320493772006642
MonoBehaviour:

DecisionPeriod: 5
TakeActionsBetweenDecisions: 1
offsetStep: 0
--- !u!114 &5530675298926254831
MonoBehaviour:
m_ObjectHideFlags: 0
m_CorrespondingSourceObject: {fileID: 0}
m_PrefabInstance: {fileID: 0}
m_PrefabAsset: {fileID: 0}
m_GameObject: {fileID: 1095606497496374}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 3a6da8f78a394c6ab027688eab81e04d, type: 3}
m_Name:
m_EditorClassIdentifier:
--- !u!1 &1100217258374548
GameObject:
m_ObjectHideFlags: 0

- component: {fileID: 114850431417842684}
- component: {fileID: 114516244030127556}
- component: {fileID: 404683423509059512}
- component: {fileID: 2668741801881409108}
m_Layer: 0
m_Name: BlueStriker
m_TagString: blueAgent

vectorActionSize: 030000000300000003000000
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 11400000, guid: 9d26b71f04a2d4680a68d8de4f6b62e9, type: 3}
m_Model: {fileID: 11400000, guid: b0a629580a0ab48a5a774f90ff1fb48b, type: 3}
m_BehaviorName: Soccer
m_BehaviorName: SoccerTwos
TeamId: 0
m_UseChildSensors: 1
--- !u!114 &114850431417842684

maxStep: 3000
team: 1
area: {fileID: 114559182131992928}
position: 2
agentRb: {fileID: 0}
--- !u!114 &114516244030127556
MonoBehaviour:

DecisionPeriod: 5
TakeActionsBetweenDecisions: 1
offsetStep: 0
--- !u!114 &2668741801881409108
MonoBehaviour:
m_ObjectHideFlags: 0
m_CorrespondingSourceObject: {fileID: 0}
m_PrefabInstance: {fileID: 0}
m_PrefabAsset: {fileID: 0}
m_GameObject: {fileID: 1131626411948014}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 3a6da8f78a394c6ab027688eab81e04d, type: 3}
m_Name:
m_EditorClassIdentifier:
--- !u!1 &1141134673700168
GameObject:
m_ObjectHideFlags: 0

- component: {fileID: 5320024511406682322}
- component: {fileID: 1023485123796557062}
- component: {fileID: 8734522883866558980}
- component: {fileID: 2436210718391481760}
m_Layer: 0
m_Name: PurpleStriker (1)
m_TagString: purpleAgent

vectorActionSize: 030000000300000003000000
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 11400000, guid: 9d26b71f04a2d4680a68d8de4f6b62e9, type: 3}
m_Model: {fileID: 11400000, guid: b0a629580a0ab48a5a774f90ff1fb48b, type: 3}
m_BehaviorName: Soccer
m_BehaviorName: SoccerTwos
TeamId: 1
m_UseChildSensors: 1
--- !u!114 &5320024511406682322

maxStep: 3000
team: 0
area: {fileID: 114559182131992928}
position: 2
agentRb: {fileID: 0}
--- !u!114 &1023485123796557062
MonoBehaviour:

DecisionPeriod: 5
TakeActionsBetweenDecisions: 1
offsetStep: 0
--- !u!114 &2436210718391481760
MonoBehaviour:
m_ObjectHideFlags: 0
m_CorrespondingSourceObject: {fileID: 0}
m_PrefabInstance: {fileID: 0}
m_PrefabAsset: {fileID: 0}
m_GameObject: {fileID: 6257467487437560250}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 3a6da8f78a394c6ab027688eab81e04d, type: 3}
m_Name:
m_EditorClassIdentifier:
--- !u!1 &6442519122303792292
GameObject:
m_ObjectHideFlags: 0

- component: {fileID: 5379409612883756837}
- component: {fileID: 2562571719799803906}
- component: {fileID: 1018414316889932458}
- component: {fileID: 5288255359135781773}
m_Layer: 0
m_Name: BlueStriker (1)
m_TagString: blueAgent

vectorActionSize: 030000000300000003000000
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 11400000, guid: 9d26b71f04a2d4680a68d8de4f6b62e9, type: 3}
m_Model: {fileID: 11400000, guid: b0a629580a0ab48a5a774f90ff1fb48b, type: 3}
m_BehaviorName: Soccer
m_BehaviorName: SoccerTwos
TeamId: 0
m_UseChildSensors: 1
--- !u!114 &5379409612883756837

maxStep: 3000
team: 1
area: {fileID: 114559182131992928}
position: 2
agentRb: {fileID: 0}
--- !u!114 &2562571719799803906
MonoBehaviour:

DecisionPeriod: 5
TakeActionsBetweenDecisions: 1
offsetStep: 0
--- !u!114 &5288255359135781773
MonoBehaviour:
m_ObjectHideFlags: 0
m_CorrespondingSourceObject: {fileID: 0}
m_PrefabInstance: {fileID: 0}
m_PrefabAsset: {fileID: 0}
m_GameObject: {fileID: 8360301818957399454}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 3a6da8f78a394c6ab027688eab81e04d, type: 3}
m_Name:
m_EditorClassIdentifier:
--- !u!1 &8673569163220857793
GameObject:
m_ObjectHideFlags: 0

71
Project/Assets/ML-Agents/Examples/Soccer/Scripts/AgentSoccer.cs


using UnityEngine;
using MLAgents;
using MLAgents.Policies;
using MLAgents.SideChannels;
public class AgentSoccer : Agent
{

Purple = 1
}
public enum Position
{
Striker,
Goalie,
Generic
}
// The coefficient for the reward for colliding with a ball. Set using curriculum.
float m_BallTouch;
public Position position;
const float k_Power = 2000f;
float m_Existential;
float m_LateralSpeed;
float m_ForwardSpeed;
[HideInInspector]
public float timePenalty = 0;
[HideInInspector]
public Rigidbody agentRb;

public override void Initialize()
{
m_Existential = 1f / maxStep;
m_BehaviorParameters = gameObject.GetComponent<BehaviorParameters>();
if (m_BehaviorParameters.TeamId == (int)Team.Blue)
{

team = Team.Purple;
m_Transform = new Vector3(transform.position.x + 4f, .5f, transform.position.z);
}
if (position == Position.Goalie)
{
m_LateralSpeed = 1.0f;
m_ForwardSpeed = 1.0f;
}
else if (position == Position.Striker)
{
m_LateralSpeed = 0.3f;
m_ForwardSpeed = 1.3f;
}
else
{
m_LateralSpeed = 0.3f;
m_ForwardSpeed = 1.0f;
}
m_SoccerSettings = FindObjectOfType<SoccerSettings>();
agentRb = GetComponent<Rigidbody>();
agentRb.maxAngularVelocity = 500;

switch (forwardAxis)
{
case 1:
dirToGo = transform.forward * 1f;
dirToGo = transform.forward * m_ForwardSpeed;
dirToGo = transform.forward * -1f;
dirToGo = transform.forward * -m_ForwardSpeed;
break;
}

dirToGo = transform.right * 0.3f;
dirToGo = transform.right * m_LateralSpeed;
dirToGo = transform.right * -0.3f;
dirToGo = transform.right * -m_LateralSpeed;
break;
}

public override void OnActionReceived(float[] vectorAction)
{
// Existential penalty for strikers.
AddReward(-1f / 3000f);
if (position == Position.Goalie)
{
// Existential bonus for Goalies.
AddReward(m_Existential);
}
else if (position == Position.Striker)
{
// Existential penalty for Strikers
AddReward(-m_Existential);
}
else
{
// Existential penalty cumulant for Generic
timePenalty -= m_Existential;
}
MoveAgent(vectorAction);
}

/// </summary>
void OnCollisionEnter(Collision c)
{
var force = 2000f * m_KickPower;
var force = k_Power * m_KickPower;
if (position == Position.Goalie)
{
force = k_Power;
}
AddReward(.2f * m_BallTouch);
var dir = c.contacts[0].point - transform.position;
dir = dir.normalized;
c.gameObject.GetComponent<Rigidbody>().AddForce(dir * force);

public override void OnEpisodeBegin()
{
timePenalty = 0;
m_BallTouch = SideChannelUtils.GetSideChannel<FloatPropertiesChannel>().GetPropertyWithDefault("ball_touch", 0);
if (team == Team.Purple)
{
transform.rotation = Quaternion.Euler(0f, -90f, 0f);

12
Project/Assets/ML-Agents/Examples/Soccer/Scripts/SoccerFieldArea.cs


{
if (ps.agentScript.team == scoredTeam)
{
ps.agentScript.AddReward(1);
ps.agentScript.AddReward(1 + ps.agentScript.timePenalty);
}
else
{

}
}
public Vector3 GetBallSpawnPosition()
{
var randomSpawnPos = ground.transform.position +
new Vector3(0f, 0f, 0f);
randomSpawnPos.y = ground.transform.position.y + .5f;
return randomSpawnPos;
}
ball.transform.position = GetBallSpawnPosition();
ball.transform.position = ballStartingPos;
ballRb.velocity = Vector3.zero;
ballRb.angularVelocity = Vector3.zero;

1001
Project/Assets/ML-Agents/Examples/Soccer/TFModels/SoccerTwos.nn
文件差异内容过多而无法显示
查看文件

2
Project/Assets/ML-Agents/Examples/Soccer/TFModels/Goalie.nn.meta


fileFormatVersion: 2
guid: 9d26b71f04a2d4680a68d8de4f6b62e9
guid: e9c10c18f4eb745d19186a54dbe3ca2e
ScriptedImporter:
fileIDToRecycleName:
11400000: main obj

2
Project/Assets/ML-Agents/Examples/Tennis/Demos/ExpertTennis.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Tennis/Demos/ExpertTennis.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/Walker/Demos/ExpertWalker.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Walker/Demos/ExpertWalker.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/ProjectSettings/ProjectVersion.txt


m_EditorVersion: 2018.4.18f1
m_EditorVersion: 2018.4.17f1

307
com.unity.ml-agents/CHANGELOG.md


# Changelog
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
and this project adheres to
[Semantic Versioning](http://semver.org/spec/v2.0.0.html).
- The `--load` and `--train` command-line flags have been deprecated. Training now happens by default, and
use `--resume` to resume training instead. (#3705)
- The Jupyter notebooks have been removed from the repository.
- Introduced the `SideChannelUtils` to register, unregister and access side channels.
- `Academy.FloatProperties` was removed, please use `SideChannelUtils.GetSideChannel<FloatPropertiesChannel>()` instead.
- Removed the multi-agent gym option from the gym wrapper. For multi-agent scenarios, use the [Low Level Python API](Python-API.md).
- The low level Python API has changed. You can look at the document [Low Level Python API documentation](Python-API.md) for more information. If you use `mlagents-learn` for training, this should be a transparent change.
- Added ability to start training (initialize model weights) from a previous run ID. (#3710)
- The internal event `Academy.AgentSetStatus` was renamed to `Academy.AgentPreStep` and made public.
- The offset logic was removed from DecisionRequester.
- The signature of `Agent.Heuristic()` was changed to take a `float[]` as a parameter, instead of returning the array. This was done to prevent a common source of error where users would return arrays of the wrong size.
- The communication API version has been bumped up to 1.0.0 and will use [Semantic Versioning](https://semver.org/) to do compatibility checks for communication between Unity and the Python process.
- The obsolete `Agent` methods `GiveModel`, `Done`, `InitializeAgent`, `AgentAction` and `AgentReset` have been removed.
- The `--load` and `--train` command-line flags have been deprecated. Training
now happens by default, and use `--resume` to resume training instead. (#3705)
- The Jupyter notebooks have been removed from the repository.
- Introduced the `SideChannelUtils` to register, unregister and access side
channels.
- `Academy.FloatProperties` was removed, please use
`SideChannelUtils.GetSideChannel<FloatPropertiesChannel>()` instead.
- Removed the multi-agent gym option from the gym wrapper. For multi-agent
scenarios, use the [Low Level Python API](../docs/Python-API.md).
- The low level Python API has changed. You can look at the document
[Low Level Python API documentation](../docs/Python-API.md) for more
information. If you use `mlagents-learn` for training, this should be a
transparent change.
- Added ability to start training (initialize model weights) from a previous run
ID. (#3710)
- The internal event `Academy.AgentSetStatus` was renamed to
`Academy.AgentPreStep` and made public.
- The offset logic was removed from DecisionRequester.
- The signature of `Agent.Heuristic()` was changed to take a `float[]` as a
parameter, instead of returning the array. This was done to prevent a common
source of error where users would return arrays of the wrong size.
- The communication API version has been bumped up to 1.0.0 and will use
[Semantic Versioning](https://semver.org/) to do compatibility checks for
communication between Unity and the Python process.
- The obsolete `Agent` methods `GiveModel`, `Done`, `InitializeAgent`,
`AgentAction` and `AgentReset` have been removed.
- The GhostTrainer has been extended to support asymmetric games and the asymmetric example environment Strikers Vs. Goalie has been added.
- Format of console output has changed slightly and now matches the name of the model/summary directory. (#3630, #3616)
- Added a feature to allow sending stats from C# environments to TensorBoard (and other python StatsWriters). To do this from your code, use `SideChannelUtils.GetSideChannel<StatsSideChannel>().AddStat(key, value)` (#3660)
- Renamed 'Generalization' feature to 'Environment Parameter Randomization'.
- Timer files now contain a dictionary of metadata, including things like the package version numbers.
- SideChannel IncomingMessages methods now take an optional default argument, which is used when trying to read more data than the message contains.
- The way that UnityEnvironment decides the port was changed. If no port is specified, the behavior will depend on the `file_name` parameter. If it is `None`, 5004 (the editor port) will be used; otherwise 5005 (the base environment port) will be used.
- Fixed an issue where exceptions from environments provided a returncode of 0. (#3680)
- Running `mlagents-learn` with the same `--run-id` twice will no longer overwrite the existing files. (#3705)
- `StackingSensor` was changed from `internal` visibility to `public`
- Updated Barracuda to 0.6.3-preview.
- Format of console output has changed slightly and now matches the name of the
model/summary directory. (#3630, #3616)
- Added a feature to allow sending stats from C# environments to TensorBoard
(and other python StatsWriters). To do this from your code, use
`SideChannelUtils.GetSideChannel<StatsSideChannel>().AddStat(key, value)`
(#3660)
- Renamed 'Generalization' feature to 'Environment Parameter Randomization'.
- Timer files now contain a dictionary of metadata, including things like the
package version numbers.
- SideChannel IncomingMessages methods now take an optional default argument,
which is used when trying to read more data than the message contains.
- The way that UnityEnvironment decides the port was changed. If no port is
specified, the behavior will depend on the `file_name` parameter. If it is
`None`, 5004 (the editor port) will be used; otherwise 5005 (the base
environment port) will be used.
- Fixed an issue where exceptions from environments provided a returncode of 0.
(#3680)
- Running `mlagents-learn` with the same `--run-id` twice will no longer
overwrite the existing files. (#3705)
- `StackingSensor` was changed from `internal` visibility to `public`
- Updated Barracuda to 0.6.3-preview.
### Bug Fixes
- Fixed a display bug when viewing Demonstration files in the inspector. The
shapes of the observations in the file now display correctly. (#3771)
- Raise the wall in CrawlerStatic scene to prevent Agent from falling off. (#3650)
- Fixed an issue where specifying `vis_encode_type` was required only for SAC. (#3677)
- Fixed the reported entropy values for continuous actions (#3684)
- Fixed an issue where switching models using `SetModel()` during training would use an excessive amount of memory. (#3664)
- Environment subprocesses now close immediately on timeout or wrong API version. (#3679)
- Fixed an issue in the gym wrapper that would raise an exception if an Agent called EndEpisode multiple times in the same step. (#3700)
- Fixed an issue where logging output was not visible; logging levels are now set consistently. (#3703)
- Raise the wall in CrawlerStatic scene to prevent Agent from falling off.
(#3650)
- Fixed an issue where specifying `vis_encode_type` was required only for SAC.
(#3677)
- Fixed the reported entropy values for continuous actions (#3684)
- Fixed an issue where switching models using `SetModel()` during training would
use an excessive amount of memory. (#3664)
- Environment subprocesses now close immediately on timeout or wrong API
version. (#3679)
- Fixed an issue in the gym wrapper that would raise an exception if an Agent
called EndEpisode multiple times in the same step. (#3700)
- Fixed an issue where logging output was not visible; logging levels are now
set consistently. (#3703)
- `Agent.CollectObservations` now takes a VectorSensor argument. (#3352, #3389)
- Added `Agent.CollectDiscreteActionMasks` virtual method with a `DiscreteActionMasker` argument to specify which discrete actions are unavailable to the Agent. (#3525)
- Beta support for ONNX export was added. If the `tf2onnx` python package is installed, models will be saved to `.onnx` as well as `.nn` format.
Note that Barracuda 0.6.0 or later is required to import the `.onnx` files properly
- Multi-GPU training and the `--multi-gpu` option has been removed temporarily. (#3345)
- All Sensor related code has been moved to the namespace `MLAgents.Sensors`.
- All SideChannel related code has been moved to the namespace `MLAgents.SideChannels`.
- `BrainParameters` and `SpaceType` have been removed from the public API
- `BehaviorParameters` have been removed from the public API.
- The following methods in the `Agent` class have been deprecated and will be removed in a later release:
- `InitializeAgent()` was renamed to `Initialize()`
- `AgentAction()` was renamed to `OnActionReceived()`
- `AgentReset()` was renamed to `OnEpisodeBegin()`
- `Done()` was renamed to `EndEpisode()`
- `GiveModel()` was renamed to `SetModel()`
- `Agent.CollectObservations` now takes a VectorSensor argument. (#3352, #3389)
- Added `Agent.CollectDiscreteActionMasks` virtual method with a
`DiscreteActionMasker` argument to specify which discrete actions are
unavailable to the Agent. (#3525)
- Beta support for ONNX export was added. If the `tf2onnx` python package is
installed, models will be saved to `.onnx` as well as `.nn` format. Note that
Barracuda 0.6.0 or later is required to import the `.onnx` files properly
- Multi-GPU training and the `--multi-gpu` option has been removed temporarily.
(#3345)
- All Sensor related code has been moved to the namespace `MLAgents.Sensors`.
- All SideChannel related code has been moved to the namespace
`MLAgents.SideChannels`.
- `BrainParameters` and `SpaceType` have been removed from the public API
- `BehaviorParameters` have been removed from the public API.
- The following methods in the `Agent` class have been deprecated and will be
removed in a later release:
- `InitializeAgent()` was renamed to `Initialize()`
- `AgentAction()` was renamed to `OnActionReceived()`
- `AgentReset()` was renamed to `OnEpisodeBegin()`
- `Done()` was renamed to `EndEpisode()`
- `GiveModel()` was renamed to `SetModel()`
- Monitor.cs was moved to Examples. (#3372)
- Automatic stepping for Academy is now controlled from the AutomaticSteppingEnabled property. (#3376)
- The GetEpisodeCount, GetStepCount, GetTotalStepCount and methods of Academy were changed to EpisodeCount, StepCount, TotalStepCount properties respectively. (#3376)
- Several classes were changed from public to internal visibility. (#3390)
- Academy.RegisterSideChannel and UnregisterSideChannel methods were added. (#3391)
- A tutorial on adding custom SideChannels was added (#3391)
- The stepping logic for the Agent and the Academy has been simplified (#3448)
- Update Barracuda to 0.6.1-preview
* The interface for `RayPerceptionSensor.PerceiveStatic()` was changed to take an input class and write to an output class, and the method was renamed to `Perceive()`.
- The checkpoint file suffix was changed from `.cptk` to `.ckpt` (#3470)
- The command-line argument used to determine the port that an environment will listen on was changed from `--port` to `--mlagents-port`.
- `DemonstrationRecorder` can now record observations outside of the editor.
- `DemonstrationRecorder` now has an optional path for the demonstrations. This will default to `Application.dataPath` if not set.
- `DemonstrationStore` was changed to accept a `Stream` for its constructor, and was renamed to `DemonstrationWriter`
- The method `GetStepCount()` on the Agent class has been replaced with the property getter `StepCount`
- `RayPerceptionSensorComponent` and related classes now display the debug gizmos whenever the Agent is selected (not just Play mode).
- Most fields on `RayPerceptionSensorComponent` can now be changed while the editor is in Play mode. The exceptions to this are fields that affect the number of observations.
- Most fields on `CameraSensorComponent` and `RenderTextureSensorComponent` were changed to private and replaced by properties with the same name.
- Unused static methods from the `Utilities` class (ShiftLeft, ReplaceRange, AddRangeNoAlloc, and GetSensorFloatObservationSize) were removed.
- The `Agent` class is no longer abstract.
- SensorBase was moved out of the package and into the Examples directory.
- `AgentInfo.actionMasks` has been renamed to `AgentInfo.discreteActionMasks`.
- `DecisionRequester` has been made internal (you can still use the DecisionRequesterComponent from the inspector). `RepeatAction` was renamed `TakeActionsBetweenDecisions` for clarity. (#3555)
- The `IFloatProperties` interface has been removed.
- Fix #3579.
- Improved inference performance for models with multiple action branches. (#3598)
- Fixed an issue when using GAIL with less than `batch_size` number of demonstrations. (#3591)
- The interfaces to the `SideChannel` classes (on C# and python) have changed to use new `IncomingMessage` and `OutgoingMessage` classes. These should make reading and writing data to the channel easier. (#3596)
- Updated the ExpertPyramid.demo example demonstration file (#3613)
- Updated project version for example environments to 2018.4.18f1. (#3618)
- Changed the Product Name in the example environments to remove spaces, so that the default build executable file doesn't contain spaces. (#3612)
- Monitor.cs was moved to Examples. (#3372)
- Automatic stepping for Academy is now controlled from the
AutomaticSteppingEnabled property. (#3376)
- The GetEpisodeCount, GetStepCount, GetTotalStepCount and methods of Academy
were changed to EpisodeCount, StepCount, TotalStepCount properties
respectively. (#3376)
- Several classes were changed from public to internal visibility. (#3390)
- Academy.RegisterSideChannel and UnregisterSideChannel methods were added.
(#3391)
- A tutorial on adding custom SideChannels was added (#3391)
- The stepping logic for the Agent and the Academy has been simplified (#3448)
- Update Barracuda to 0.6.1-preview
* The interface for `RayPerceptionSensor.PerceiveStatic()` was changed to take
an input class and write to an output class, and the method was renamed to
`Perceive()`.
- The checkpoint file suffix was changed from `.cptk` to `.ckpt` (#3470)
- The command-line argument used to determine the port that an environment will
listen on was changed from `--port` to `--mlagents-port`.
- `DemonstrationRecorder` can now record observations outside of the editor.
- `DemonstrationRecorder` now has an optional path for the demonstrations. This
will default to `Application.dataPath` if not set.
- `DemonstrationStore` was changed to accept a `Stream` for its constructor, and
was renamed to `DemonstrationWriter`
- The method `GetStepCount()` on the Agent class has been replaced with the
property getter `StepCount`
- `RayPerceptionSensorComponent` and related classes now display the debug
gizmos whenever the Agent is selected (not just Play mode).
- Most fields on `RayPerceptionSensorComponent` can now be changed while the
editor is in Play mode. The exceptions to this are fields that affect the
number of observations.
- Most fields on `CameraSensorComponent` and `RenderTextureSensorComponent` were
changed to private and replaced by properties with the same name.
- Unused static methods from the `Utilities` class (ShiftLeft, ReplaceRange,
AddRangeNoAlloc, and GetSensorFloatObservationSize) were removed.
- The `Agent` class is no longer abstract.
- SensorBase was moved out of the package and into the Examples directory.
- `AgentInfo.actionMasks` has been renamed to `AgentInfo.discreteActionMasks`.
- `DecisionRequester` has been made internal (you can still use the
DecisionRequesterComponent from the inspector). `RepeatAction` was renamed
`TakeActionsBetweenDecisions` for clarity. (#3555)
- The `IFloatProperties` interface has been removed.
- Fix #3579.
- Improved inference performance for models with multiple action branches.
(#3598)
- Fixed an issue when using GAIL with less than `batch_size` number of
demonstrations. (#3591)
- The interfaces to the `SideChannel` classes (on C# and python) have changed to
use new `IncomingMessage` and `OutgoingMessage` classes. These should make
reading and writing data to the channel easier. (#3596)
- Updated the ExpertPyramid.demo example demonstration file (#3613)
- Updated project version for example environments to 2018.4.18f1. (#3618)
- Changed the Product Name in the example environments to remove spaces, so that
the default build executable file doesn't contain spaces. (#3612)
- Fixed an issue which caused self-play training sessions to consume a lot of memory. (#3451)
- Fixed an IndexError when using GAIL or behavioral cloning with demonstrations recorded with 0.14.0 or later (#3464)
- Fixed an issue which caused self-play training sessions to consume a lot of
memory. (#3451)
- Fixed an IndexError when using GAIL or behavioral cloning with demonstrations
recorded with 0.14.0 or later (#3464)
- Fixed a bug with the rewards of multiple Agents in the gym interface (#3471, #3496)
- Fixed a bug with the rewards of multiple Agents in the gym interface (#3471,
#3496)
- A new self-play mechanism for training agents in adversarial scenarios was added (#3194)
- Tennis and Soccer environments were refactored to enable training with self-play (#3194, #3331)
- UnitySDK folder was split into a Unity Package (com.unity.ml-agents) and our examples were moved to the Project folder (#3267)
- A new self-play mechanism for training agents in adversarial scenarios was
added (#3194)
- Tennis and Soccer environments were refactored to enable training with
self-play (#3194, #3331)
- UnitySDK folder was split into a Unity Package (com.unity.ml-agents) and our
examples were moved to the Project folder (#3267)
- In order to reduce the size of the API, several classes and methods were marked as internal or private. Some public fields on the Agent were trimmed (#3342, #3353, #3269)
- Decision Period and on-demand decision checkboxes were removed from the Agent. on-demand decision is now the default (#3243)
- Calling Done() on the Agent will reset it immediately and call the AgentReset virtual method (#3291, #3242)
- The "Reset on Done" setting in AgentParameters was removed; this is now always true. AgentOnDone virtual method on the Agent was removed (#3311, #3222)
- Trainer steps are now counted per-Agent, not per-environment as in previous versions. For instance, if you have 10 Agents in the scene, 20 environment steps now correspond to 200 steps as printed in the terminal and in Tensorboard (#3113)
- In order to reduce the size of the API, several classes and methods were
marked as internal or private. Some public fields on the Agent were trimmed
(#3342, #3353, #3269)
- Decision Period and on-demand decision checkboxes were removed from the Agent.
on-demand decision is now the default (#3243)
- Calling Done() on the Agent will reset it immediately and call the AgentReset
virtual method (#3291, #3242)
- The "Reset on Done" setting in AgentParameters was removed; this is now always
true. AgentOnDone virtual method on the Agent was removed (#3311, #3222)
- Trainer steps are now counted per-Agent, not per-environment as in previous
versions. For instance, if you have 10 Agents in the scene, 20 environment
steps now correspond to 200 steps as printed in the terminal and in
Tensorboard (#3113)
- Curriculum config files are now YAML formatted and all curricula for a training run are combined into a single file (#3186)
- ML-Agents components, such as BehaviorParameters and various Sensor implementations, now appear in the Components menu (#3231)
- Exceptions are now raised in Unity (in debug mode only) if NaN observations or rewards are passed (#3221)
- RayPerception MonoBehavior, which was previously deprecated, was removed (#3304)
- Uncompressed visual (i.e. 3d float arrays) observations are now supported. CameraSensorComponent and RenderTextureSensor now have an option to write uncompressed observations (#3148)
- Agent’s handling of observations during training was improved so that an extra copy of the observations is no longer maintained (#3229)
- Error message for missing trainer config files was improved to include the absolute path (#3230)
- Curriculum config files are now YAML formatted and all curricula for a
training run are combined into a single file (#3186)
- ML-Agents components, such as BehaviorParameters and various Sensor
implementations, now appear in the Components menu (#3231)
- Exceptions are now raised in Unity (in debug mode only) if NaN observations or
rewards are passed (#3221)
- RayPerception MonoBehavior, which was previously deprecated, was removed
(#3304)
- Uncompressed visual (i.e. 3d float arrays) observations are now supported.
CameraSensorComponent and RenderTextureSensor now have an option to write
uncompressed observations (#3148)
- Agent’s handling of observations during training was improved so that an extra
copy of the observations is no longer maintained (#3229)
- Error message for missing trainer config files was improved to include the
absolute path (#3230)
- A bug that caused RayPerceptionSensor to behave inconsistently with transforms that have non-1 scale was fixed (#3321)
- Some small bugfixes to tensorflow_to_barracuda.py were backported from the barracuda release (#3341)
- Base port in the jupyter notebook example was updated to use the same port that the editor uses (#3283)
- A bug that caused RayPerceptionSensor to behave inconsistently with transforms
that have non-1 scale was fixed (#3321)
- Some small bugfixes to tensorflow_to_barracuda.py were backported from the
barracuda release (#3341)
- Base port in the jupyter notebook example was updated to use the same port
that the editor uses (#3283)
### This is the first release of *Unity Package ML-Agents*.
### This is the first release of _Unity Package ML-Agents_.
*Short description of this release*
_Short description of this release_

78
com.unity.ml-agents/Editor/DemonstrationDrawer.cs


using System.Collections.Generic;
using System.Text;
using UnityEditor;
using MLAgents.Demonstrations;

namespace MLAgents.Editor
{
/// <summary>
/// Renders a custom UI for Demonstration Scriptable Object.
/// Renders a custom UI for DemonstrationSummary ScriptableObject.
[CustomEditor(typeof(Demonstration))]
[CustomEditor(typeof(DemonstrationSummary))]
SerializedProperty m_ObservationShapes;
m_ObservationShapes = serializedObject.FindProperty("observationSummaries");
}
/// <summary>

{
var nameProp = property.FindPropertyRelative("demonstrationName");
var expProp = property.FindPropertyRelative("numberExperiences");
var epiProp = property.FindPropertyRelative("numberEpisodes");
var rewProp = property.FindPropertyRelative("meanReward");
var experiencesProp = property.FindPropertyRelative("numberSteps");
var episodesProp = property.FindPropertyRelative("numberEpisodes");
var rewardsProp = property.FindPropertyRelative("meanReward");
var expLabel = expProp.displayName + ": " + expProp.intValue;
var epiLabel = epiProp.displayName + ": " + epiProp.intValue;
var rewLabel = rewProp.displayName + ": " + rewProp.floatValue;
var experiencesLabel = experiencesProp.displayName + ": " + experiencesProp.intValue;
var episodesLabel = episodesProp.displayName + ": " + episodesProp.intValue;
var rewardsLabel = rewardsProp.displayName + ": " + rewardsProp.floatValue;
EditorGUILayout.LabelField(expLabel);
EditorGUILayout.LabelField(epiLabel);
EditorGUILayout.LabelField(rewLabel);
EditorGUILayout.LabelField(experiencesLabel);
EditorGUILayout.LabelField(episodesLabel);
EditorGUILayout.LabelField(rewardsLabel);
/// Constructs label for action size array.
/// Constructs label for a serialized integer array.
static string BuildActionArrayLabel(SerializedProperty actionSizeProperty)
static string BuildIntArrayLabel(SerializedProperty actionSizeProperty)
{
var actionSize = actionSizeProperty.arraySize;
var actionLabel = new StringBuilder("[ ");

}
/// <summary>
/// Renders Inspector UI for Brain Parameters of Demonstration.
/// Renders Inspector UI for BrainParameters of a DemonstrationSummary.
/// Only the Action size and type are used from the BrainParameters.
void MakeBrainParametersProperty(SerializedProperty property)
void MakeActionsProperty(SerializedProperty property)
var vecObsSizeProp = property.FindPropertyRelative("vectorObservationSize");
var numStackedProp = property.FindPropertyRelative("numStackedVectorObservations");
var vecObsSizeLabel = vecObsSizeProp.displayName + ": " + vecObsSizeProp.intValue;
var numStackedLabel = numStackedProp.displayName + ": " + numStackedProp.intValue;
actSizeProperty.displayName + ": " + BuildActionArrayLabel(actSizeProperty);
actSizeProperty.displayName + ": " + BuildIntArrayLabel(actSizeProperty);
EditorGUILayout.LabelField(vecObsSizeLabel);
EditorGUILayout.LabelField(numStackedLabel);
/// <summary>
/// Render the observation shapes of a DemonstrationSummary.
/// </summary>
/// <param name="obsSummariesProperty"></param>
void MakeObservationsProperty(SerializedProperty obsSummariesProperty)
{
var shapesLabels = new List<string>();
var numObservations = obsSummariesProperty.arraySize;
for (var i = 0; i < numObservations; i++)
{
var summary = obsSummariesProperty.GetArrayElementAtIndex(i);
var shapeProperty = summary.FindPropertyRelative("shape");
shapesLabels.Add(BuildIntArrayLabel(shapeProperty));
}
var shapeLabel = $"Shapes: {string.Join(", ", shapesLabels)}";
EditorGUILayout.LabelField(shapeLabel);
}
EditorGUI.indentLevel++;
EditorGUILayout.LabelField("Brain Parameters", EditorStyles.boldLabel);
MakeBrainParametersProperty(m_BrainParameters);
EditorGUI.indentLevel--;
EditorGUILayout.LabelField("Observations", EditorStyles.boldLabel);
EditorGUI.indentLevel++;
MakeObservationsProperty(m_ObservationShapes);
EditorGUI.indentLevel--;
EditorGUILayout.LabelField("Actions", EditorStyles.boldLabel);
EditorGUI.indentLevel++;
MakeActionsProperty(m_BrainParameters);
EditorGUI.indentLevel--;
serializedObject.ApplyModifiedProperties();
}
}

26
com.unity.ml-agents/Editor/DemonstrationImporter.cs


using System;
using System.Collections.Generic;
using System.IO;
using MLAgents.CommunicatorObjects;
using UnityEditor;

try
{
// Read first two proto objects containing metadata and brain parameters.
// Read first three proto objects containing metadata, brain parameters, and observations.
Stream reader = File.OpenRead(ctx.assetPath);
var metaDataProto = DemonstrationMetaProto.Parser.ParseDelimitedFrom(reader);

var brainParamsProto = BrainParametersProto.Parser.ParseDelimitedFrom(reader);
var brainParameters = brainParamsProto.ToBrainParameters();
// Read the first AgentInfoActionPair so that we can get the observation sizes.
List<ObservationSummary> observationSummaries;
try
{
var agentInfoActionPairProto = AgentInfoActionPairProto.Parser.ParseDelimitedFrom(reader);
observationSummaries = agentInfoActionPairProto.GetObservationSummaries();
}
catch
{
// Just in case there weren't any AgentInfoActionPair or they couldn't be read.
observationSummaries = new List<ObservationSummary>();
}
var demonstration = ScriptableObject.CreateInstance<Demonstration>();
demonstration.Initialize(brainParameters, metaData);
userData = demonstration.ToString();
var demonstrationSummary = ScriptableObject.CreateInstance<DemonstrationSummary>();
demonstrationSummary.Initialize(brainParameters, metaData, observationSummaries);
userData = demonstrationSummary.ToString();
ctx.AddObjectToAsset(ctx.assetPath, demonstration, texture);
ctx.SetMainObject(demonstration);
ctx.AddObjectToAsset(ctx.assetPath, demonstrationSummary, texture);
ctx.SetMainObject(demonstrationSummary);
}
catch
{

21
com.unity.ml-agents/Runtime/Agent.cs


m_CumulativeReward = 0f;
m_RequestAction = false;
m_RequestDecision = false;
Array.Clear(m_Info.storedVectorActions, 0, m_Info.storedVectorActions.Length);
}
/// <summary>

return;
}
m_Info.storedVectorActions = m_Action.vectorActions;
if (m_Info.done)
{
Array.Clear(m_Info.storedVectorActions, 0, m_Info.storedVectorActions.Length);
}
else
{
Array.Copy(m_Action.vectorActions, m_Info.storedVectorActions, m_Action.vectorActions.Length);
}
m_ActionMasker.ResetMask();
UpdateSensors();
using (TimerStack.Instance.Scoped("CollectObservations"))

void DecideAction()
{
m_Action.vectorActions = m_Brain?.DecideAction();
}
var action = m_Brain?.DecideAction();
if (action == null)
{
Array.Clear(m_Action.vectorActions, 0, m_Action.vectorActions.Length);
}
else
{
Array.Copy(action, m_Action.vectorActions, action.Length);
}
}
}

68
com.unity.ml-agents/Runtime/Communicator/GrpcExtensions.cs


{
internal static class GrpcExtensions
{
#region AgentInfo
/// <summary>
/// Converts a AgentInfo to a protobuf generated AgentInfoActionPairProto
/// </summary>

}
/// <summary>
/// Get summaries for the observations in the AgentInfo part of the AgentInfoActionPairProto.
/// </summary>
/// <param name="infoActionPair"></param>
/// <returns></returns>
public static List<ObservationSummary> GetObservationSummaries(this AgentInfoActionPairProto infoActionPair)
{
List<ObservationSummary> summariesOut = new List<ObservationSummary>();
var agentInfo = infoActionPair.AgentInfo;
foreach (var obs in agentInfo.Observations)
{
var summary = new ObservationSummary();
summary.shape = obs.Shape.ToArray();
summariesOut.Add(summary);
}
return summariesOut;
}
#endregion
#region BrainParameters
/// <summary>
/// Converts a Brain into to a Protobuf BrainInfoProto so it can be sent
/// </summary>
/// <returns>The BrainInfoProto generated.</returns>

}
/// <summary>
/// Convert a BrainParametersProto to a BrainParameters struct.
/// </summary>
/// <param name="bpp">An instance of a brain parameters protobuf object.</param>
/// <returns>A BrainParameters struct.</returns>
public static BrainParameters ToBrainParameters(this BrainParametersProto bpp)
{
var bp = new BrainParameters
{
vectorActionSize = bpp.VectorActionSize.ToArray(),
vectorActionDescriptions = bpp.VectorActionDescriptions.ToArray(),
vectorActionSpaceType = (SpaceType)bpp.VectorActionSpaceType
};
return bp;
}
#endregion
#region DemonstrationMetaData
/// <summary>
/// Convert metadata object to proto object.
/// </summary>
public static DemonstrationMetaProto ToProto(this DemonstrationMetaData dm)

ApiVersion = DemonstrationMetaData.ApiVersion,
MeanReward = dm.meanReward,
NumberSteps = dm.numberExperiences,
NumberSteps = dm.numberSteps,
NumberEpisodes = dm.numberEpisodes,
DemonstrationName = dm.demonstrationName
};

var dm = new DemonstrationMetaData
{
numberEpisodes = demoProto.NumberEpisodes,
numberExperiences = demoProto.NumberSteps,
numberSteps = demoProto.NumberSteps,
meanReward = demoProto.MeanReward,
demonstrationName = demoProto.DemonstrationName
};

}
return dm;
}
/// <summary>
/// Convert a BrainParametersProto to a BrainParameters struct.
/// </summary>
/// <param name="bpp">An instance of a brain parameters protobuf object.</param>
/// <returns>A BrainParameters struct.</returns>
public static BrainParameters ToBrainParameters(this BrainParametersProto bpp)
{
var bp = new BrainParameters
{
vectorActionSize = bpp.VectorActionSize.ToArray(),
vectorActionDescriptions = bpp.VectorActionDescriptions.ToArray(),
vectorActionSpaceType = (SpaceType)bpp.VectorActionSpaceType
};
return bp;
}
#endregion
public static UnityRLInitParameters ToUnityRLInitParameters(this UnityRLInitializationInputProto inputProto)
{

};
}
#region AgentAction
public static AgentAction ToAgentAction(this AgentActionProto aap)
{
return new AgentAction

}
return agentActions;
}
#endregion
#region Observations
public static ObservationProto ToProto(this Observation obs)
{
ObservationProto obsProto = null;

observationProto.Shape.AddRange(shape);
return observationProto;
}
#endregion
}
}

2
com.unity.ml-agents/Runtime/Demonstrations/DemonstrationWriter.cs


}
// Increment meta-data counters.
m_MetaData.numberExperiences++;
m_MetaData.numberSteps++;
m_CumulativeReward += info.reward;
if (info.done)
{

17
com.unity.ml-agents/Runtime/Policies/HeuristicPolicy.cs


ActionGenerator m_Heuristic;
float[] m_LastDecision;
int m_numActions;
bool m_Done;
bool m_DecisionRequested;
WriteAdapter m_WriteAdapter = new WriteAdapter();
NullList m_NullList = new NullList();

{
m_Heuristic = heuristic;
m_numActions = numActions;
m_LastDecision = new float[m_numActions];
}
/// <inheritdoc />

if (!info.done)
{
// Reset m_LastDecision each time.
m_LastDecision = new float[m_numActions];
m_Heuristic.Invoke(m_LastDecision);
}
m_Done = info.done;
m_DecisionRequested = true;
if (!m_Done && m_DecisionRequested)
{
m_Heuristic.Invoke(m_LastDecision);
}
m_DecisionRequested = false;
return m_LastDecision;
}

35
com.unity.ml-agents/Runtime/Timer.cs


}
/// <summary>
/// Tracks the most recent value of a metric. This is analogous to gauges in statsd.
/// Tracks the most recent value of a metric. This is analogous to gauges in statsd and Prometheus.
/// </summary>
[DataContract]
internal class GaugeNode

/// <summary>
/// The most recent value that the gauge was set to.
/// </summary>
/// <summary>
/// The smallest value that has been seen for the gauge since it was created.
/// </summary>
/// <summary>
/// The largest value that has been seen for the gauge since it was created.
/// </summary>
/// <summary>
/// The exponential moving average of the gauge value. This will take all values into account,
/// but weights older values less as more values are added.
/// </summary>
/// <summary>
/// The running average of all gauge values.
/// </summary>
[DataMember]
public float runningAverage;
/// <summary>
/// The number of times the gauge has been updated.
/// </summary>
runningAverage = value;
minValue = value;
maxValue = value;
count = 1;

{
++count;
++count;
// Update running average - see https://www.johndcook.com/blog/standard_deviation/ for formula.
runningAverage = runningAverage + (newValue - runningAverage) / count;
}
}

43
com.unity.ml-agents/Tests/Editor/TimerTest.cs


myTimer.Reset();
Assert.AreEqual(myTimer.RootNode.Children, null);
}
[Test]
public void TestGauges()
{
TimerStack myTimer = TimerStack.Instance;
myTimer.Reset();
// Simple test - adding 1's should keep that for the weighted and running averages.
myTimer.SetGauge("one", 1.0f);
var oneNode = myTimer.RootNode.Gauges["one"];
Assert.AreEqual(oneNode.weightedAverage, 1.0f);
Assert.AreEqual(oneNode.runningAverage, 1.0f);
for (int i = 0; i < 10; i++)
{
myTimer.SetGauge("one", 1.0f);
}
Assert.AreEqual(oneNode.weightedAverage, 1.0f);
Assert.AreEqual(oneNode.runningAverage, 1.0f);
// Try some more interesting values
myTimer.SetGauge("increasing", 1.0f);
myTimer.SetGauge("increasing", 2.0f);
myTimer.SetGauge("increasing", 3.0f);
myTimer.SetGauge("decreasing", 3.0f);
myTimer.SetGauge("decreasing", 2.0f);
myTimer.SetGauge("decreasing", 1.0f);
var increasingNode = myTimer.RootNode.Gauges["increasing"];
var decreasingNode = myTimer.RootNode.Gauges["decreasing"];
// Expect the running average to be (roughly) the same,
// but weighted averages will be biased differently.
Assert.AreEqual(increasingNode.runningAverage, 2.0f);
Assert.AreEqual(decreasingNode.runningAverage, 2.0f);
// The older values are actually weighted more heavily, so we expect the
// increasing series to have a lower moving average.
Assert.Less(increasingNode.weightedAverage, decreasingNode.weightedAverage);
}
}
}

36
config/trainer_config.yaml


swap_steps: 50000
team_change: 100000
Soccer:
Goalie:
normalize: false
max_steps: 5.0e7
learning_rate_schedule: constant
batch_size: 2048
buffer_size: 20480
hidden_units: 512
time_horizon: 1000
num_layers: 2
self_play:
window: 10
play_against_latest_model_ratio: 0.5
save_steps: 50000
swap_steps: 25000
team_change: 200000
Striker:
normalize: false
max_steps: 5.0e7
learning_rate_schedule: constant
batch_size: 2048
buffer_size: 20480
hidden_units: 512
time_horizon: 1000
num_layers: 2
self_play:
window: 10
play_against_latest_model_ratio: 0.5
save_steps: 50000
swap_steps: 100000
team_change: 200000
SoccerTwos:
normalize: false
max_steps: 5.0e7
learning_rate_schedule: constant

play_against_latest_model_ratio: 0.5
save_steps: 50000
swap_steps: 50000
team_change: 100000
team_change: 200000
CrawlerStatic:
normalize: true

92
docs/FAQ.md


## Installation problems
### Tensorflow dependency
ML Agents requires TensorFlow; if you don't already have it installed, `pip` will try to install it when you install
the ml-agents package.
ML Agents requires TensorFlow; if you don't already have it installed, `pip`
will try to install it when you install the ml-agents package.
it means that there is no version of TensorFlow for your python environment. Some known potential causes are:
* You're using 32-bit python instead of 64-bit. See the answer [here](https://stackoverflow.com/a/1405971/224264)
for how to tell which you have installed.
* You're using python 3.8. Tensorflow plans to release packages for this as soon as possible; see
[this issue](https://github.com/tensorflow/tensorflow/issues/33374) for more details.
* You have the `tensorflow-gpu` package installed. This is equivalent to `tensorflow`, however `pip` doesn't recognize
this. The best way to resolve this is to update to `tensorflow==1.15.0` which provides GPU support in the same package
(see the [release notes](https://github.com/tensorflow/tensorflow/issues/33374) for more details.)
* You're on another architecture (e.g. ARM) which requires vendor provided packages.
In all of these cases, the issue is a pip/python environment setup issue. Please search the tensorflow github issues
for similar problems and solutions before creating a new issue.
## Scripting Runtime Environment not setup correctly
If you haven't switched your scripting runtime version from .NET 3.5 to .NET 4.6
or .NET 4.x, you will see such error message:
it means that there is no version of TensorFlow for your python environment.
Some known potential causes are:
```console
error CS1061: Type `System.Text.StringBuilder' does not contain a definition for `Clear' and no extension method `Clear' of type `System.Text.StringBuilder' could be found. Are you missing an assembly reference?
```
- You're using 32-bit python instead of 64-bit. See the answer
[here](https://stackoverflow.com/a/1405971/224264) for how to tell which you
have installed.
- You're using python 3.8. Tensorflow plans to release packages for this as soon
as possible; see
[this issue](https://github.com/tensorflow/tensorflow/issues/33374) for more
details.
- You have the `tensorflow-gpu` package installed. This is equivalent to
`tensorflow`, however `pip` doesn't recognize this. The best way to resolve
this is to update to `tensorflow==1.15.0` which provides GPU support in the
same package (see the
[release notes](https://github.com/tensorflow/tensorflow/issues/33374) for
more details.)
- You're on another architecture (e.g. ARM) which requires vendor provided
packages.
This is because .NET 3.5 doesn't support method Clear() for StringBuilder, refer
to [Setting Up The ML-Agents Toolkit Within
Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
In all of these cases, the issue is a pip/python environment setup issue. Please
search the tensorflow github issues for similar problems and solutions before
creating a new issue.
If you directly import your Unity environment without building it in the
editor, you might need to give it additional permissions to execute it.
If you directly import your Unity environment without building it in the editor,
you might need to give it additional permissions to execute it.
If you receive such a permission error on macOS, run:

```
On Windows, you can find
[instructions](https://technet.microsoft.com/en-us/library/cc754344(v=ws.11).aspx).
[instructions](<https://technet.microsoft.com/en-us/library/cc754344(v=ws.11).aspx>).
## Environment Connection Timeout

There may be a number of possible causes:
* _Cause_: There may be no agent in the scene
* _Cause_: On OSX, the firewall may be preventing communication with the
- _Cause_: There may be no agent in the scene
- _Cause_: On OSX, the firewall may be preventing communication with the
* _Cause_: An error happened in the Unity Environment preventing communication.
_Solution_: Look into the [log
files](https://docs.unity3d.com/Manual/LogFiles.html) generated by the Unity
Environment to figure what error happened.
* _Cause_: You have assigned HTTP_PROXY and HTTPS_PROXY values in your
- _Cause_: An error happened in the Unity Environment preventing communication.
_Solution_: Look into the
[log files](https://docs.unity3d.com/Manual/LogFiles.html) generated by the
Unity Environment to figure what error happened.
- _Cause_: You have assigned `HTTP_PROXY` and `HTTPS_PROXY` values in your
If you receive an exception `"Couldn't launch new environment because
communication port {} is still in use. "`, you can change the worker number in
the Python script when calling
If you receive an exception
`"Couldn't launch new environment because communication port {} is still in use. "`,
you can change the worker number in the Python script when calling
```python
UnityEnvironment(file_name=filename, worker_id=X)

If you receive a message `Mean reward : nan` when attempting to train a model
using PPO, this is due to the episodes of the Learning Environment not
terminating. In order to address this, set `Max Steps` for the
Agents within the Scene Inspector to a value greater than 0. Alternatively, it
is possible to manually set `done` conditions for episodes from within scripts
for custom episode-terminating events.
## Problems with training on AWS
Please refer to [Training on Amazon Web Service FAQ](Training-on-Amazon-Web-Service.md#faq)
# Known Issues
## Release 0.10.0
* ml-agents 0.10.0 and earlier were incompatible with TensorFlow 1.15.0; the graph could contain
an operator that `tensorflow_to_barracuda` didn't handle. This was fixed in the 0.11.0 release.
terminating. In order to address this, set `Max Steps` for the Agents within the
Scene Inspector to a value greater than 0. Alternatively, it is possible to
manually set `done` conditions for episodes from within scripts for custom
episode-terminating events.

346
docs/Getting-Started.md


# Getting Started Guide
This guide walks through the end-to-end process of opening an ML-Agents
toolkit example environment in Unity, building the Unity executable, training an
Agent in it, and finally embedding the trained model into the Unity environment.
The ML-Agents toolkit includes a number of [example
environments](Learning-Environment-Examples.md) which you can examine to help
understand the different ways in which the ML-Agents toolkit can be used. These
environments can also serve as templates for new environments or as ways to test
new ML algorithms. After reading this tutorial, you should be able to explore
train the example environments.
If you are not familiar with the [Unity Engine](https://unity3d.com/unity), we
highly recommend the [Roll-a-ball
tutorial](https://unity3d.com/learn/tutorials/s/roll-ball-tutorial) to learn all
the basic concepts first.
This guide walks through the end-to-end process of opening one of our
[example environments](Learning-Environment-Examples.md) in Unity, training an
Agent in it, and embedding the trained model into the Unity environment. After
reading this tutorial, you should be able to train any of the example
environments. If you are not familiar with the
[Unity Engine](https://unity3d.com/unity), view our
[Background: Unity](Background-Unity.md) page for helpful pointers.
Additionally, if you're not familiar with machine learning, view our
[Background: Machine Learning](Background-Machine-Learning.md) page for a brief
overview and helpful pointers.
This guide uses the **3D Balance Ball** environment to teach the basic concepts and
usage patterns of ML-Agents. 3D Balance Ball
contains a number of agent cubes and balls (which are all copies of each other).
Each agent cube tries to keep its ball from falling by rotating either
horizontally or vertically. In this environment, an agent cube is an **Agent** that
receives a reward for every step that it balances the ball. An agent is also
penalized with a negative reward for dropping the ball. The goal of the training
process is to have the agents learn to balance the ball on their head.
For this guide, we'll use the **3D Balance Ball** environment which contains a
number of agent cubes and balls (which are all copies of each other). Each agent
cube tries to keep its ball from falling by rotating either horizontally or
vertically. In this environment, an agent cube is an **Agent** that receives a
reward for every step that it balances the ball. An agent is also penalized with
a negative reward for dropping the ball. The goal of the training process is to
have the agents learn to balance the ball on their head.