浏览代码

Merge branch 'master' of github.com:Unity-Technologies/ml-agents into develop-sac-apex

/develop/sac-apex
Ervin Teng 5 年前
当前提交
0bc4cc1e
共有 51 个文件被更改,包括 3208 次插入4183 次删除
  1. 2
      Project/Assets/ML-Agents/Examples/3DBall/Demos/Expert3DBall.demo.meta
  2. 2
      Project/Assets/ML-Agents/Examples/3DBall/Demos/Expert3DBallHard.demo.meta
  3. 2
      Project/Assets/ML-Agents/Examples/Basic/Demos/ExpertBasic.demo.meta
  4. 2
      Project/Assets/ML-Agents/Examples/Bouncer/Demos/ExpertBouncer.demo.meta
  5. 2
      Project/Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerDyn.demo.meta
  6. 2
      Project/Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerSta.demo.meta
  7. 2
      Project/Assets/ML-Agents/Examples/FoodCollector/Demos/ExpertFood.demo.meta
  8. 2
      Project/Assets/ML-Agents/Examples/GridWorld/Demos/ExpertGrid.demo.meta
  9. 2
      Project/Assets/ML-Agents/Examples/Hallway/Demos/ExpertHallway.demo.meta
  10. 2
      Project/Assets/ML-Agents/Examples/PushBlock/Demos/ExpertPush.demo.meta
  11. 2
      Project/Assets/ML-Agents/Examples/Pyramids/Demos/ExpertPyramid.demo.meta
  12. 2
      Project/Assets/ML-Agents/Examples/Reacher/Demos/ExpertReacher.demo.meta
  13. 2
      Project/Assets/ML-Agents/Examples/Tennis/Demos/ExpertTennis.demo.meta
  14. 2
      Project/Assets/ML-Agents/Examples/Walker/Demos/ExpertWalker.demo.meta
  15. 2
      Project/ProjectSettings/ProjectVersion.txt
  16. 308
      com.unity.ml-agents/CHANGELOG.md
  17. 78
      com.unity.ml-agents/Editor/DemonstrationDrawer.cs
  18. 26
      com.unity.ml-agents/Editor/DemonstrationImporter.cs
  19. 68
      com.unity.ml-agents/Runtime/Communicator/GrpcExtensions.cs
  20. 2
      com.unity.ml-agents/Runtime/Demonstrations/DemonstrationWriter.cs
  21. 35
      com.unity.ml-agents/Runtime/Timer.cs
  22. 43
      com.unity.ml-agents/Tests/Editor/TimerTest.cs
  23. 92
      docs/FAQ.md
  24. 346
      docs/Getting-Started.md
  25. 120
      docs/Installation-Anaconda-Windows.md
  26. 508
      docs/Learning-Environment-Design-Agents.md
  27. 644
      docs/Learning-Environment-Examples.md
  28. 84
      docs/Learning-Environment-Executable.md
  29. 39
      docs/ML-Agents-Overview.md
  30. 100
      docs/Readme.md
  31. 6
      docs/Training-Imitation-Learning.md
  32. 368
      docs/Training-ML-Agents.md
  33. 147
      docs/Training-on-Amazon-Web-Service.md
  34. 145
      docs/Training-on-Microsoft-Azure.md
  35. 36
      docs/Using-Docker.md
  36. 82
      docs/Using-Tensorboard.md
  37. 66
      docs/Using-Virtual-Environment.md
  38. 257
      docs/images/demo_inspector.png
  39. 999
      docs/images/docker_build_settings.png
  40. 198
      docs/images/learning_environment_basic.png
  41. 545
      docs/images/learning_environment_example.png
  42. 604
      docs/images/unity_package_json.png
  43. 999
      docs/images/unity_package_manager_window.png
  44. 95
      ml-agents/mlagents/trainers/learn.py
  45. 22
      com.unity.ml-agents/Runtime/Demonstrations/DemonstrationMetaData.cs
  46. 11
      com.unity.ml-agents/Runtime/Demonstrations/DemonstrationMetaData.cs.meta
  47. 37
      com.unity.ml-agents/Runtime/Demonstrations/DemonstrationSummary.cs
  48. 122
      docs/images/learning_environment_full.png
  49. 38
      com.unity.ml-agents/Runtime/Demonstrations/Demonstration.cs
  50. 91
      docs/Training-on-Microsoft-Azure-Custom-Instance.md
  51. 0
      /com.unity.ml-agents/Runtime/Demonstrations/DemonstrationSummary.cs.meta

2
Project/Assets/ML-Agents/Examples/3DBall/Demos/Expert3DBall.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/3DBall/Demos/Expert3DBall.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/3DBall/Demos/Expert3DBallHard.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/3DBall/Demos/Expert3DBallHard.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/Basic/Demos/ExpertBasic.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Basic/Demos/ExpertBasic.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/Bouncer/Demos/ExpertBouncer.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Bouncer/Demos/ExpertBouncer.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerDyn.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerDyn.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerSta.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerSta.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/FoodCollector/Demos/ExpertFood.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/FoodCollector/Demos/ExpertFood.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/GridWorld/Demos/ExpertGrid.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/GridWorld/Demos/ExpertGrid.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/Hallway/Demos/ExpertHallway.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Hallway/Demos/ExpertHallway.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/PushBlock/Demos/ExpertPush.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/PushBlock/Demos/ExpertPush.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/Pyramids/Demos/ExpertPyramid.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Pyramids/Demos/ExpertPyramid.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/Reacher/Demos/ExpertReacher.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Reacher/Demos/ExpertReacher.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/Tennis/Demos/ExpertTennis.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Tennis/Demos/ExpertTennis.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/Assets/ML-Agents/Examples/Walker/Demos/ExpertWalker.demo.meta


fileIDToRecycleName:
11400000: Assets/ML-Agents/Examples/Walker/Demos/ExpertWalker.demo
externalObjects: {}
userData: ' (MLAgents.Demonstrations.Demonstration)'
userData: ' (MLAgents.Demonstrations.DemonstrationSummary)'
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 7bd65ce151aaa4a41a45312543c56be1, type: 3}

2
Project/ProjectSettings/ProjectVersion.txt


m_EditorVersion: 2018.4.18f1
m_EditorVersion: 2018.4.17f1

308
com.unity.ml-agents/CHANGELOG.md


# Changelog
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
and this project adheres to
[Semantic Versioning](http://semver.org/spec/v2.0.0.html).
- The `--load` and `--train` command-line flags have been deprecated. Training now happens by default, and
use `--resume` to resume training instead. (#3705)
- The Jupyter notebooks have been removed from the repository.
- Introduced the `SideChannelUtils` to register, unregister and access side channels.
- `Academy.FloatProperties` was removed, please use `SideChannelUtils.GetSideChannel<FloatPropertiesChannel>()` instead.
- Removed the multi-agent gym option from the gym wrapper. For multi-agent scenarios, use the [Low Level Python API](Python-API.md).
- The low level Python API has changed. You can look at the document [Low Level Python API documentation](Python-API.md) for more information. If you use `mlagents-learn` for training, this should be a transparent change.
- Added ability to start training (initialize model weights) from a previous run ID. (#3710)
- The internal event `Academy.AgentSetStatus` was renamed to `Academy.AgentPreStep` and made public.
- The offset logic was removed from DecisionRequester.
- The signature of `Agent.Heuristic()` was changed to take a `float[]` as a parameter, instead of returning the array. This was done to prevent a common source of error where users would return arrays of the wrong size.
- The communication API version has been bumped up to 1.0.0 and will use [Semantic Versioning](https://semver.org/) to do compatibility checks for communication between Unity and the Python process.
- The obsolete `Agent` methods `GiveModel`, `Done`, `InitializeAgent`, `AgentAction` and `AgentReset` have been removed.
- The `--load` and `--train` command-line flags have been deprecated. Training
now happens by default, and use `--resume` to resume training instead. (#3705)
- The Jupyter notebooks have been removed from the repository.
- Introduced the `SideChannelUtils` to register, unregister and access side
channels.
- `Academy.FloatProperties` was removed, please use
`SideChannelUtils.GetSideChannel<FloatPropertiesChannel>()` instead.
- Removed the multi-agent gym option from the gym wrapper. For multi-agent
scenarios, use the [Low Level Python API](../docs/Python-API.md).
- The low level Python API has changed. You can look at the document
[Low Level Python API documentation](../docs/Python-API.md) for more
information. If you use `mlagents-learn` for training, this should be a
transparent change.
- Added ability to start training (initialize model weights) from a previous run
ID. (#3710)
- The internal event `Academy.AgentSetStatus` was renamed to
`Academy.AgentPreStep` and made public.
- The offset logic was removed from DecisionRequester.
- The signature of `Agent.Heuristic()` was changed to take a `float[]` as a
parameter, instead of returning the array. This was done to prevent a common
source of error where users would return arrays of the wrong size.
- The communication API version has been bumped up to 1.0.0 and will use
[Semantic Versioning](https://semver.org/) to do compatibility checks for
communication between Unity and the Python process.
- The obsolete `Agent` methods `GiveModel`, `Done`, `InitializeAgent`,
`AgentAction` and `AgentReset` have been removed.
- Format of console output has changed slightly and now matches the name of the model/summary directory. (#3630, #3616)
- Added a feature to allow sending stats from C# environments to TensorBoard (and other python StatsWriters). To do this from your code, use `SideChannelUtils.GetSideChannel<StatsSideChannel>().AddStat(key, value)` (#3660)
- Renamed 'Generalization' feature to 'Environment Parameter Randomization'.
- Timer files now contain a dictionary of metadata, including things like the package version numbers.
- SideChannel IncomingMessages methods now take an optional default argument, which is used when trying to read more data than the message contains.
- The way that UnityEnvironment decides the port was changed. If no port is specified, the behavior will depend on the `file_name` parameter. If it is `None`, 5004 (the editor port) will be used; otherwise 5005 (the base environment port) will be used.
- Fixed an issue where exceptions from environments provided a returncode of 0. (#3680)
- Running `mlagents-learn` with the same `--run-id` twice will no longer overwrite the existing files. (#3705)
- `StackingSensor` was changed from `internal` visibility to `public`
- Updated Barracuda to 0.6.3-preview.
- Model updates now happen asynchronously with environment steps. (#3690)
- Format of console output has changed slightly and now matches the name of the
model/summary directory. (#3630, #3616)
- Added a feature to allow sending stats from C# environments to TensorBoard
(and other python StatsWriters). To do this from your code, use
`SideChannelUtils.GetSideChannel<StatsSideChannel>().AddStat(key, value)`
(#3660)
- Renamed 'Generalization' feature to 'Environment Parameter Randomization'.
- Timer files now contain a dictionary of metadata, including things like the
package version numbers.
- SideChannel IncomingMessages methods now take an optional default argument,
which is used when trying to read more data than the message contains.
- The way that UnityEnvironment decides the port was changed. If no port is
specified, the behavior will depend on the `file_name` parameter. If it is
`None`, 5004 (the editor port) will be used; otherwise 5005 (the base
environment port) will be used.
- Fixed an issue where exceptions from environments provided a returncode of 0.
(#3680)
- Running `mlagents-learn` with the same `--run-id` twice will no longer
overwrite the existing files. (#3705)
- `StackingSensor` was changed from `internal` visibility to `public`
- Updated Barracuda to 0.6.3-preview.
- Model updates can now happen asynchronously with environment steps for better performance. (#3690)
### Bug Fixes
- Fixed a display bug when viewing Demonstration files in the inspector. The
shapes of the observations in the file now display correctly. (#3771)
- Raise the wall in CrawlerStatic scene to prevent Agent from falling off. (#3650)
- Fixed an issue where specifying `vis_encode_type` was required only for SAC. (#3677)
- Fixed the reported entropy values for continuous actions (#3684)
- Fixed an issue where switching models using `SetModel()` during training would use an excessive amount of memory. (#3664)
- Environment subprocesses now close immediately on timeout or wrong API version. (#3679)
- Fixed an issue in the gym wrapper that would raise an exception if an Agent called EndEpisode multiple times in the same step. (#3700)
- Fixed an issue where logging output was not visible; logging levels are now set consistently. (#3703)
- Raise the wall in CrawlerStatic scene to prevent Agent from falling off.
(#3650)
- Fixed an issue where specifying `vis_encode_type` was required only for SAC.
(#3677)
- Fixed the reported entropy values for continuous actions (#3684)
- Fixed an issue where switching models using `SetModel()` during training would
use an excessive amount of memory. (#3664)
- Environment subprocesses now close immediately on timeout or wrong API
version. (#3679)
- Fixed an issue in the gym wrapper that would raise an exception if an Agent
called EndEpisode multiple times in the same step. (#3700)
- Fixed an issue where logging output was not visible; logging levels are now
set consistently. (#3703)
- `Agent.CollectObservations` now takes a VectorSensor argument. (#3352, #3389)
- Added `Agent.CollectDiscreteActionMasks` virtual method with a `DiscreteActionMasker` argument to specify which discrete actions are unavailable to the Agent. (#3525)
- Beta support for ONNX export was added. If the `tf2onnx` python package is installed, models will be saved to `.onnx` as well as `.nn` format.
Note that Barracuda 0.6.0 or later is required to import the `.onnx` files properly
- Multi-GPU training and the `--multi-gpu` option has been removed temporarily. (#3345)
- All Sensor related code has been moved to the namespace `MLAgents.Sensors`.
- All SideChannel related code has been moved to the namespace `MLAgents.SideChannels`.
- `BrainParameters` and `SpaceType` have been removed from the public API
- `BehaviorParameters` have been removed from the public API.
- The following methods in the `Agent` class have been deprecated and will be removed in a later release:
- `InitializeAgent()` was renamed to `Initialize()`
- `AgentAction()` was renamed to `OnActionReceived()`
- `AgentReset()` was renamed to `OnEpisodeBegin()`
- `Done()` was renamed to `EndEpisode()`
- `GiveModel()` was renamed to `SetModel()`
- `Agent.CollectObservations` now takes a VectorSensor argument. (#3352, #3389)
- Added `Agent.CollectDiscreteActionMasks` virtual method with a
`DiscreteActionMasker` argument to specify which discrete actions are
unavailable to the Agent. (#3525)
- Beta support for ONNX export was added. If the `tf2onnx` python package is
installed, models will be saved to `.onnx` as well as `.nn` format. Note that
Barracuda 0.6.0 or later is required to import the `.onnx` files properly
- Multi-GPU training and the `--multi-gpu` option has been removed temporarily.
(#3345)
- All Sensor related code has been moved to the namespace `MLAgents.Sensors`.
- All SideChannel related code has been moved to the namespace
`MLAgents.SideChannels`.
- `BrainParameters` and `SpaceType` have been removed from the public API
- `BehaviorParameters` have been removed from the public API.
- The following methods in the `Agent` class have been deprecated and will be
removed in a later release:
- `InitializeAgent()` was renamed to `Initialize()`
- `AgentAction()` was renamed to `OnActionReceived()`
- `AgentReset()` was renamed to `OnEpisodeBegin()`
- `Done()` was renamed to `EndEpisode()`
- `GiveModel()` was renamed to `SetModel()`
- Monitor.cs was moved to Examples. (#3372)
- Automatic stepping for Academy is now controlled from the AutomaticSteppingEnabled property. (#3376)
- The GetEpisodeCount, GetStepCount, GetTotalStepCount and methods of Academy were changed to EpisodeCount, StepCount, TotalStepCount properties respectively. (#3376)
- Several classes were changed from public to internal visibility. (#3390)
- Academy.RegisterSideChannel and UnregisterSideChannel methods were added. (#3391)
- A tutorial on adding custom SideChannels was added (#3391)
- The stepping logic for the Agent and the Academy has been simplified (#3448)
- Update Barracuda to 0.6.1-preview
* The interface for `RayPerceptionSensor.PerceiveStatic()` was changed to take an input class and write to an output class, and the method was renamed to `Perceive()`.
- The checkpoint file suffix was changed from `.cptk` to `.ckpt` (#3470)
- The command-line argument used to determine the port that an environment will listen on was changed from `--port` to `--mlagents-port`.
- `DemonstrationRecorder` can now record observations outside of the editor.
- `DemonstrationRecorder` now has an optional path for the demonstrations. This will default to `Application.dataPath` if not set.
- `DemonstrationStore` was changed to accept a `Stream` for its constructor, and was renamed to `DemonstrationWriter`
- The method `GetStepCount()` on the Agent class has been replaced with the property getter `StepCount`
- `RayPerceptionSensorComponent` and related classes now display the debug gizmos whenever the Agent is selected (not just Play mode).
- Most fields on `RayPerceptionSensorComponent` can now be changed while the editor is in Play mode. The exceptions to this are fields that affect the number of observations.
- Most fields on `CameraSensorComponent` and `RenderTextureSensorComponent` were changed to private and replaced by properties with the same name.
- Unused static methods from the `Utilities` class (ShiftLeft, ReplaceRange, AddRangeNoAlloc, and GetSensorFloatObservationSize) were removed.
- The `Agent` class is no longer abstract.
- SensorBase was moved out of the package and into the Examples directory.
- `AgentInfo.actionMasks` has been renamed to `AgentInfo.discreteActionMasks`.
- `DecisionRequester` has been made internal (you can still use the DecisionRequesterComponent from the inspector). `RepeatAction` was renamed `TakeActionsBetweenDecisions` for clarity. (#3555)
- The `IFloatProperties` interface has been removed.
- Fix #3579.
- Improved inference performance for models with multiple action branches. (#3598)
- Fixed an issue when using GAIL with less than `batch_size` number of demonstrations. (#3591)
- The interfaces to the `SideChannel` classes (on C# and python) have changed to use new `IncomingMessage` and `OutgoingMessage` classes. These should make reading and writing data to the channel easier. (#3596)
- Updated the ExpertPyramid.demo example demonstration file (#3613)
- Updated project version for example environments to 2018.4.18f1. (#3618)
- Changed the Product Name in the example environments to remove spaces, so that the default build executable file doesn't contain spaces. (#3612)
- Monitor.cs was moved to Examples. (#3372)
- Automatic stepping for Academy is now controlled from the
AutomaticSteppingEnabled property. (#3376)
- The GetEpisodeCount, GetStepCount, GetTotalStepCount and methods of Academy
were changed to EpisodeCount, StepCount, TotalStepCount properties
respectively. (#3376)
- Several classes were changed from public to internal visibility. (#3390)
- Academy.RegisterSideChannel and UnregisterSideChannel methods were added.
(#3391)
- A tutorial on adding custom SideChannels was added (#3391)
- The stepping logic for the Agent and the Academy has been simplified (#3448)
- Update Barracuda to 0.6.1-preview
* The interface for `RayPerceptionSensor.PerceiveStatic()` was changed to take
an input class and write to an output class, and the method was renamed to
`Perceive()`.
- The checkpoint file suffix was changed from `.cptk` to `.ckpt` (#3470)
- The command-line argument used to determine the port that an environment will
listen on was changed from `--port` to `--mlagents-port`.
- `DemonstrationRecorder` can now record observations outside of the editor.
- `DemonstrationRecorder` now has an optional path for the demonstrations. This
will default to `Application.dataPath` if not set.
- `DemonstrationStore` was changed to accept a `Stream` for its constructor, and
was renamed to `DemonstrationWriter`
- The method `GetStepCount()` on the Agent class has been replaced with the
property getter `StepCount`
- `RayPerceptionSensorComponent` and related classes now display the debug
gizmos whenever the Agent is selected (not just Play mode).
- Most fields on `RayPerceptionSensorComponent` can now be changed while the
editor is in Play mode. The exceptions to this are fields that affect the
number of observations.
- Most fields on `CameraSensorComponent` and `RenderTextureSensorComponent` were
changed to private and replaced by properties with the same name.
- Unused static methods from the `Utilities` class (ShiftLeft, ReplaceRange,
AddRangeNoAlloc, and GetSensorFloatObservationSize) were removed.
- The `Agent` class is no longer abstract.
- SensorBase was moved out of the package and into the Examples directory.
- `AgentInfo.actionMasks` has been renamed to `AgentInfo.discreteActionMasks`.
- `DecisionRequester` has been made internal (you can still use the
DecisionRequesterComponent from the inspector). `RepeatAction` was renamed
`TakeActionsBetweenDecisions` for clarity. (#3555)
- The `IFloatProperties` interface has been removed.
- Fix #3579.
- Improved inference performance for models with multiple action branches.
(#3598)
- Fixed an issue when using GAIL with less than `batch_size` number of
demonstrations. (#3591)
- The interfaces to the `SideChannel` classes (on C# and python) have changed to
use new `IncomingMessage` and `OutgoingMessage` classes. These should make
reading and writing data to the channel easier. (#3596)
- Updated the ExpertPyramid.demo example demonstration file (#3613)
- Updated project version for example environments to 2018.4.18f1. (#3618)
- Changed the Product Name in the example environments to remove spaces, so that
the default build executable file doesn't contain spaces. (#3612)
- Fixed an issue which caused self-play training sessions to consume a lot of memory. (#3451)
- Fixed an IndexError when using GAIL or behavioral cloning with demonstrations recorded with 0.14.0 or later (#3464)
- Fixed an issue which caused self-play training sessions to consume a lot of
memory. (#3451)
- Fixed an IndexError when using GAIL or behavioral cloning with demonstrations
recorded with 0.14.0 or later (#3464)
- Fixed a bug with the rewards of multiple Agents in the gym interface (#3471, #3496)
- Fixed a bug with the rewards of multiple Agents in the gym interface (#3471,
#3496)
- A new self-play mechanism for training agents in adversarial scenarios was added (#3194)
- Tennis and Soccer environments were refactored to enable training with self-play (#3194, #3331)
- UnitySDK folder was split into a Unity Package (com.unity.ml-agents) and our examples were moved to the Project folder (#3267)
- A new self-play mechanism for training agents in adversarial scenarios was
added (#3194)
- Tennis and Soccer environments were refactored to enable training with
self-play (#3194, #3331)
- UnitySDK folder was split into a Unity Package (com.unity.ml-agents) and our
examples were moved to the Project folder (#3267)
- In order to reduce the size of the API, several classes and methods were marked as internal or private. Some public fields on the Agent were trimmed (#3342, #3353, #3269)
- Decision Period and on-demand decision checkboxes were removed from the Agent. on-demand decision is now the default (#3243)
- Calling Done() on the Agent will reset it immediately and call the AgentReset virtual method (#3291, #3242)
- The "Reset on Done" setting in AgentParameters was removed; this is now always true. AgentOnDone virtual method on the Agent was removed (#3311, #3222)
- Trainer steps are now counted per-Agent, not per-environment as in previous versions. For instance, if you have 10 Agents in the scene, 20 environment steps now correspond to 200 steps as printed in the terminal and in Tensorboard (#3113)
- In order to reduce the size of the API, several classes and methods were
marked as internal or private. Some public fields on the Agent were trimmed
(#3342, #3353, #3269)
- Decision Period and on-demand decision checkboxes were removed from the Agent.
on-demand decision is now the default (#3243)
- Calling Done() on the Agent will reset it immediately and call the AgentReset
virtual method (#3291, #3242)
- The "Reset on Done" setting in AgentParameters was removed; this is now always
true. AgentOnDone virtual method on the Agent was removed (#3311, #3222)
- Trainer steps are now counted per-Agent, not per-environment as in previous
versions. For instance, if you have 10 Agents in the scene, 20 environment
steps now correspond to 200 steps as printed in the terminal and in
Tensorboard (#3113)
- Curriculum config files are now YAML formatted and all curricula for a training run are combined into a single file (#3186)
- ML-Agents components, such as BehaviorParameters and various Sensor implementations, now appear in the Components menu (#3231)
- Exceptions are now raised in Unity (in debug mode only) if NaN observations or rewards are passed (#3221)
- RayPerception MonoBehavior, which was previously deprecated, was removed (#3304)
- Uncompressed visual (i.e. 3d float arrays) observations are now supported. CameraSensorComponent and RenderTextureSensor now have an option to write uncompressed observations (#3148)
- Agent’s handling of observations during training was improved so that an extra copy of the observations is no longer maintained (#3229)
- Error message for missing trainer config files was improved to include the absolute path (#3230)
- Curriculum config files are now YAML formatted and all curricula for a
training run are combined into a single file (#3186)
- ML-Agents components, such as BehaviorParameters and various Sensor
implementations, now appear in the Components menu (#3231)
- Exceptions are now raised in Unity (in debug mode only) if NaN observations or
rewards are passed (#3221)
- RayPerception MonoBehavior, which was previously deprecated, was removed
(#3304)
- Uncompressed visual (i.e. 3d float arrays) observations are now supported.
CameraSensorComponent and RenderTextureSensor now have an option to write
uncompressed observations (#3148)
- Agent’s handling of observations during training was improved so that an extra
copy of the observations is no longer maintained (#3229)
- Error message for missing trainer config files was improved to include the
absolute path (#3230)
- A bug that caused RayPerceptionSensor to behave inconsistently with transforms that have non-1 scale was fixed (#3321)
- Some small bugfixes to tensorflow_to_barracuda.py were backported from the barracuda release (#3341)
- Base port in the jupyter notebook example was updated to use the same port that the editor uses (#3283)
- A bug that caused RayPerceptionSensor to behave inconsistently with transforms
that have non-1 scale was fixed (#3321)
- Some small bugfixes to tensorflow_to_barracuda.py were backported from the
barracuda release (#3341)
- Base port in the jupyter notebook example was updated to use the same port
that the editor uses (#3283)
### This is the first release of *Unity Package ML-Agents*.
### This is the first release of _Unity Package ML-Agents_.
*Short description of this release*
_Short description of this release_

78
com.unity.ml-agents/Editor/DemonstrationDrawer.cs


using System.Collections.Generic;
using System.Text;
using UnityEditor;
using MLAgents.Demonstrations;

namespace MLAgents.Editor
{
/// <summary>
/// Renders a custom UI for Demonstration Scriptable Object.
/// Renders a custom UI for DemonstrationSummary ScriptableObject.
[CustomEditor(typeof(Demonstration))]
[CustomEditor(typeof(DemonstrationSummary))]
SerializedProperty m_ObservationShapes;
m_ObservationShapes = serializedObject.FindProperty("observationSummaries");
}
/// <summary>

{
var nameProp = property.FindPropertyRelative("demonstrationName");
var expProp = property.FindPropertyRelative("numberExperiences");
var epiProp = property.FindPropertyRelative("numberEpisodes");
var rewProp = property.FindPropertyRelative("meanReward");
var experiencesProp = property.FindPropertyRelative("numberSteps");
var episodesProp = property.FindPropertyRelative("numberEpisodes");
var rewardsProp = property.FindPropertyRelative("meanReward");
var expLabel = expProp.displayName + ": " + expProp.intValue;
var epiLabel = epiProp.displayName + ": " + epiProp.intValue;
var rewLabel = rewProp.displayName + ": " + rewProp.floatValue;
var experiencesLabel = experiencesProp.displayName + ": " + experiencesProp.intValue;
var episodesLabel = episodesProp.displayName + ": " + episodesProp.intValue;
var rewardsLabel = rewardsProp.displayName + ": " + rewardsProp.floatValue;
EditorGUILayout.LabelField(expLabel);
EditorGUILayout.LabelField(epiLabel);
EditorGUILayout.LabelField(rewLabel);
EditorGUILayout.LabelField(experiencesLabel);
EditorGUILayout.LabelField(episodesLabel);
EditorGUILayout.LabelField(rewardsLabel);
/// Constructs label for action size array.
/// Constructs label for a serialized integer array.
static string BuildActionArrayLabel(SerializedProperty actionSizeProperty)
static string BuildIntArrayLabel(SerializedProperty actionSizeProperty)
{
var actionSize = actionSizeProperty.arraySize;
var actionLabel = new StringBuilder("[ ");

}
/// <summary>
/// Renders Inspector UI for Brain Parameters of Demonstration.
/// Renders Inspector UI for BrainParameters of a DemonstrationSummary.
/// Only the Action size and type are used from the BrainParameters.
void MakeBrainParametersProperty(SerializedProperty property)
void MakeActionsProperty(SerializedProperty property)
var vecObsSizeProp = property.FindPropertyRelative("vectorObservationSize");
var numStackedProp = property.FindPropertyRelative("numStackedVectorObservations");
var vecObsSizeLabel = vecObsSizeProp.displayName + ": " + vecObsSizeProp.intValue;
var numStackedLabel = numStackedProp.displayName + ": " + numStackedProp.intValue;
actSizeProperty.displayName + ": " + BuildActionArrayLabel(actSizeProperty);
actSizeProperty.displayName + ": " + BuildIntArrayLabel(actSizeProperty);
EditorGUILayout.LabelField(vecObsSizeLabel);
EditorGUILayout.LabelField(numStackedLabel);
/// <summary>
/// Render the observation shapes of a DemonstrationSummary.
/// </summary>
/// <param name="obsSummariesProperty"></param>
void MakeObservationsProperty(SerializedProperty obsSummariesProperty)
{
var shapesLabels = new List<string>();
var numObservations = obsSummariesProperty.arraySize;
for (var i = 0; i < numObservations; i++)
{
var summary = obsSummariesProperty.GetArrayElementAtIndex(i);
var shapeProperty = summary.FindPropertyRelative("shape");
shapesLabels.Add(BuildIntArrayLabel(shapeProperty));
}
var shapeLabel = $"Shapes: {string.Join(", ", shapesLabels)}";
EditorGUILayout.LabelField(shapeLabel);
}
EditorGUI.indentLevel++;
EditorGUILayout.LabelField("Brain Parameters", EditorStyles.boldLabel);
MakeBrainParametersProperty(m_BrainParameters);
EditorGUI.indentLevel--;
EditorGUILayout.LabelField("Observations", EditorStyles.boldLabel);
EditorGUI.indentLevel++;
MakeObservationsProperty(m_ObservationShapes);
EditorGUI.indentLevel--;
EditorGUILayout.LabelField("Actions", EditorStyles.boldLabel);
EditorGUI.indentLevel++;
MakeActionsProperty(m_BrainParameters);
EditorGUI.indentLevel--;
serializedObject.ApplyModifiedProperties();
}
}

26
com.unity.ml-agents/Editor/DemonstrationImporter.cs


using System;
using System.Collections.Generic;
using System.IO;
using MLAgents.CommunicatorObjects;
using UnityEditor;

try
{
// Read first two proto objects containing metadata and brain parameters.
// Read first three proto objects containing metadata, brain parameters, and observations.
Stream reader = File.OpenRead(ctx.assetPath);
var metaDataProto = DemonstrationMetaProto.Parser.ParseDelimitedFrom(reader);

var brainParamsProto = BrainParametersProto.Parser.ParseDelimitedFrom(reader);
var brainParameters = brainParamsProto.ToBrainParameters();
// Read the first AgentInfoActionPair so that we can get the observation sizes.
List<ObservationSummary> observationSummaries;
try
{
var agentInfoActionPairProto = AgentInfoActionPairProto.Parser.ParseDelimitedFrom(reader);
observationSummaries = agentInfoActionPairProto.GetObservationSummaries();
}
catch
{
// Just in case there weren't any AgentInfoActionPair or they couldn't be read.
observationSummaries = new List<ObservationSummary>();
}
var demonstration = ScriptableObject.CreateInstance<Demonstration>();
demonstration.Initialize(brainParameters, metaData);
userData = demonstration.ToString();
var demonstrationSummary = ScriptableObject.CreateInstance<DemonstrationSummary>();
demonstrationSummary.Initialize(brainParameters, metaData, observationSummaries);
userData = demonstrationSummary.ToString();
ctx.AddObjectToAsset(ctx.assetPath, demonstration, texture);
ctx.SetMainObject(demonstration);
ctx.AddObjectToAsset(ctx.assetPath, demonstrationSummary, texture);
ctx.SetMainObject(demonstrationSummary);
}
catch
{

68
com.unity.ml-agents/Runtime/Communicator/GrpcExtensions.cs


{
internal static class GrpcExtensions
{
#region AgentInfo
/// <summary>
/// Converts a AgentInfo to a protobuf generated AgentInfoActionPairProto
/// </summary>

}
/// <summary>
/// Get summaries for the observations in the AgentInfo part of the AgentInfoActionPairProto.
/// </summary>
/// <param name="infoActionPair"></param>
/// <returns></returns>
public static List<ObservationSummary> GetObservationSummaries(this AgentInfoActionPairProto infoActionPair)
{
List<ObservationSummary> summariesOut = new List<ObservationSummary>();
var agentInfo = infoActionPair.AgentInfo;
foreach (var obs in agentInfo.Observations)
{
var summary = new ObservationSummary();
summary.shape = obs.Shape.ToArray();
summariesOut.Add(summary);
}
return summariesOut;
}
#endregion
#region BrainParameters
/// <summary>
/// Converts a Brain into to a Protobuf BrainInfoProto so it can be sent
/// </summary>
/// <returns>The BrainInfoProto generated.</returns>

}
/// <summary>
/// Convert a BrainParametersProto to a BrainParameters struct.
/// </summary>
/// <param name="bpp">An instance of a brain parameters protobuf object.</param>
/// <returns>A BrainParameters struct.</returns>
public static BrainParameters ToBrainParameters(this BrainParametersProto bpp)
{
var bp = new BrainParameters
{
vectorActionSize = bpp.VectorActionSize.ToArray(),
vectorActionDescriptions = bpp.VectorActionDescriptions.ToArray(),
vectorActionSpaceType = (SpaceType)bpp.VectorActionSpaceType
};
return bp;
}
#endregion
#region DemonstrationMetaData
/// <summary>
/// Convert metadata object to proto object.
/// </summary>
public static DemonstrationMetaProto ToProto(this DemonstrationMetaData dm)

ApiVersion = DemonstrationMetaData.ApiVersion,
MeanReward = dm.meanReward,
NumberSteps = dm.numberExperiences,
NumberSteps = dm.numberSteps,
NumberEpisodes = dm.numberEpisodes,
DemonstrationName = dm.demonstrationName
};

var dm = new DemonstrationMetaData
{
numberEpisodes = demoProto.NumberEpisodes,
numberExperiences = demoProto.NumberSteps,
numberSteps = demoProto.NumberSteps,
meanReward = demoProto.MeanReward,
demonstrationName = demoProto.DemonstrationName
};

}
return dm;
}
/// <summary>
/// Convert a BrainParametersProto to a BrainParameters struct.
/// </summary>
/// <param name="bpp">An instance of a brain parameters protobuf object.</param>
/// <returns>A BrainParameters struct.</returns>
public static BrainParameters ToBrainParameters(this BrainParametersProto bpp)
{
var bp = new BrainParameters
{
vectorActionSize = bpp.VectorActionSize.ToArray(),
vectorActionDescriptions = bpp.VectorActionDescriptions.ToArray(),
vectorActionSpaceType = (SpaceType)bpp.VectorActionSpaceType
};
return bp;
}
#endregion
public static UnityRLInitParameters ToUnityRLInitParameters(this UnityRLInitializationInputProto inputProto)
{

};
}
#region AgentAction
public static AgentAction ToAgentAction(this AgentActionProto aap)
{
return new AgentAction

}
return agentActions;
}
#endregion
#region Observations
public static ObservationProto ToProto(this Observation obs)
{
ObservationProto obsProto = null;

observationProto.Shape.AddRange(shape);
return observationProto;
}
#endregion
}
}

2
com.unity.ml-agents/Runtime/Demonstrations/DemonstrationWriter.cs


}
// Increment meta-data counters.
m_MetaData.numberExperiences++;
m_MetaData.numberSteps++;
m_CumulativeReward += info.reward;
if (info.done)
{

35
com.unity.ml-agents/Runtime/Timer.cs


}
/// <summary>
/// Tracks the most recent value of a metric. This is analogous to gauges in statsd.
/// Tracks the most recent value of a metric. This is analogous to gauges in statsd and Prometheus.
/// </summary>
[DataContract]
internal class GaugeNode

/// <summary>
/// The most recent value that the gauge was set to.
/// </summary>
/// <summary>
/// The smallest value that has been seen for the gauge since it was created.
/// </summary>
/// <summary>
/// The largest value that has been seen for the gauge since it was created.
/// </summary>
/// <summary>
/// The exponential moving average of the gauge value. This will take all values into account,
/// but weights older values less as more values are added.
/// </summary>
/// <summary>
/// The running average of all gauge values.
/// </summary>
[DataMember]
public float runningAverage;
/// <summary>
/// The number of times the gauge has been updated.
/// </summary>
runningAverage = value;
minValue = value;
maxValue = value;
count = 1;

{
++count;
++count;
// Update running average - see https://www.johndcook.com/blog/standard_deviation/ for formula.
runningAverage = runningAverage + (newValue - runningAverage) / count;
}
}

43
com.unity.ml-agents/Tests/Editor/TimerTest.cs


myTimer.Reset();
Assert.AreEqual(myTimer.RootNode.Children, null);
}
[Test]
public void TestGauges()
{
TimerStack myTimer = TimerStack.Instance;
myTimer.Reset();
// Simple test - adding 1's should keep that for the weighted and running averages.
myTimer.SetGauge("one", 1.0f);
var oneNode = myTimer.RootNode.Gauges["one"];
Assert.AreEqual(oneNode.weightedAverage, 1.0f);
Assert.AreEqual(oneNode.runningAverage, 1.0f);
for (int i = 0; i < 10; i++)
{
myTimer.SetGauge("one", 1.0f);
}
Assert.AreEqual(oneNode.weightedAverage, 1.0f);
Assert.AreEqual(oneNode.runningAverage, 1.0f);
// Try some more interesting values
myTimer.SetGauge("increasing", 1.0f);
myTimer.SetGauge("increasing", 2.0f);
myTimer.SetGauge("increasing", 3.0f);
myTimer.SetGauge("decreasing", 3.0f);
myTimer.SetGauge("decreasing", 2.0f);
myTimer.SetGauge("decreasing", 1.0f);
var increasingNode = myTimer.RootNode.Gauges["increasing"];
var decreasingNode = myTimer.RootNode.Gauges["decreasing"];
// Expect the running average to be (roughly) the same,
// but weighted averages will be biased differently.
Assert.AreEqual(increasingNode.runningAverage, 2.0f);
Assert.AreEqual(decreasingNode.runningAverage, 2.0f);
// The older values are actually weighted more heavily, so we expect the
// increasing series to have a lower moving average.
Assert.Less(increasingNode.weightedAverage, decreasingNode.weightedAverage);
}
}
}

92
docs/FAQ.md


## Installation problems
### Tensorflow dependency
ML Agents requires TensorFlow; if you don't already have it installed, `pip` will try to install it when you install
the ml-agents package.
ML Agents requires TensorFlow; if you don't already have it installed, `pip`
will try to install it when you install the ml-agents package.
it means that there is no version of TensorFlow for your python environment. Some known potential causes are:
* You're using 32-bit python instead of 64-bit. See the answer [here](https://stackoverflow.com/a/1405971/224264)
for how to tell which you have installed.
* You're using python 3.8. Tensorflow plans to release packages for this as soon as possible; see
[this issue](https://github.com/tensorflow/tensorflow/issues/33374) for more details.
* You have the `tensorflow-gpu` package installed. This is equivalent to `tensorflow`, however `pip` doesn't recognize
this. The best way to resolve this is to update to `tensorflow==1.15.0` which provides GPU support in the same package
(see the [release notes](https://github.com/tensorflow/tensorflow/issues/33374) for more details.)
* You're on another architecture (e.g. ARM) which requires vendor provided packages.
In all of these cases, the issue is a pip/python environment setup issue. Please search the tensorflow github issues
for similar problems and solutions before creating a new issue.
## Scripting Runtime Environment not setup correctly
If you haven't switched your scripting runtime version from .NET 3.5 to .NET 4.6
or .NET 4.x, you will see such error message:
it means that there is no version of TensorFlow for your python environment.
Some known potential causes are:
```console
error CS1061: Type `System.Text.StringBuilder' does not contain a definition for `Clear' and no extension method `Clear' of type `System.Text.StringBuilder' could be found. Are you missing an assembly reference?
```
- You're using 32-bit python instead of 64-bit. See the answer
[here](https://stackoverflow.com/a/1405971/224264) for how to tell which you
have installed.
- You're using python 3.8. Tensorflow plans to release packages for this as soon
as possible; see
[this issue](https://github.com/tensorflow/tensorflow/issues/33374) for more
details.
- You have the `tensorflow-gpu` package installed. This is equivalent to
`tensorflow`, however `pip` doesn't recognize this. The best way to resolve
this is to update to `tensorflow==1.15.0` which provides GPU support in the
same package (see the
[release notes](https://github.com/tensorflow/tensorflow/issues/33374) for
more details.)
- You're on another architecture (e.g. ARM) which requires vendor provided
packages.
This is because .NET 3.5 doesn't support method Clear() for StringBuilder, refer
to [Setting Up The ML-Agents Toolkit Within
Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
In all of these cases, the issue is a pip/python environment setup issue. Please
search the tensorflow github issues for similar problems and solutions before
creating a new issue.
If you directly import your Unity environment without building it in the
editor, you might need to give it additional permissions to execute it.
If you directly import your Unity environment without building it in the editor,
you might need to give it additional permissions to execute it.
If you receive such a permission error on macOS, run:

```
On Windows, you can find
[instructions](https://technet.microsoft.com/en-us/library/cc754344(v=ws.11).aspx).
[instructions](<https://technet.microsoft.com/en-us/library/cc754344(v=ws.11).aspx>).
## Environment Connection Timeout

There may be a number of possible causes:
* _Cause_: There may be no agent in the scene
* _Cause_: On OSX, the firewall may be preventing communication with the
- _Cause_: There may be no agent in the scene
- _Cause_: On OSX, the firewall may be preventing communication with the
* _Cause_: An error happened in the Unity Environment preventing communication.
_Solution_: Look into the [log
files](https://docs.unity3d.com/Manual/LogFiles.html) generated by the Unity
Environment to figure what error happened.
* _Cause_: You have assigned HTTP_PROXY and HTTPS_PROXY values in your
- _Cause_: An error happened in the Unity Environment preventing communication.
_Solution_: Look into the
[log files](https://docs.unity3d.com/Manual/LogFiles.html) generated by the
Unity Environment to figure what error happened.
- _Cause_: You have assigned `HTTP_PROXY` and `HTTPS_PROXY` values in your
If you receive an exception `"Couldn't launch new environment because
communication port {} is still in use. "`, you can change the worker number in
the Python script when calling
If you receive an exception
`"Couldn't launch new environment because communication port {} is still in use. "`,
you can change the worker number in the Python script when calling
```python
UnityEnvironment(file_name=filename, worker_id=X)

If you receive a message `Mean reward : nan` when attempting to train a model
using PPO, this is due to the episodes of the Learning Environment not
terminating. In order to address this, set `Max Steps` for the
Agents within the Scene Inspector to a value greater than 0. Alternatively, it
is possible to manually set `done` conditions for episodes from within scripts
for custom episode-terminating events.
## Problems with training on AWS
Please refer to [Training on Amazon Web Service FAQ](Training-on-Amazon-Web-Service.md#faq)
# Known Issues
## Release 0.10.0
* ml-agents 0.10.0 and earlier were incompatible with TensorFlow 1.15.0; the graph could contain
an operator that `tensorflow_to_barracuda` didn't handle. This was fixed in the 0.11.0 release.
terminating. In order to address this, set `Max Steps` for the Agents within the
Scene Inspector to a value greater than 0. Alternatively, it is possible to
manually set `done` conditions for episodes from within scripts for custom
episode-terminating events.

346
docs/Getting-Started.md


# Getting Started Guide
This guide walks through the end-to-end process of opening an ML-Agents
toolkit example environment in Unity, building the Unity executable, training an
Agent in it, and finally embedding the trained model into the Unity environment.
The ML-Agents toolkit includes a number of [example
environments](Learning-Environment-Examples.md) which you can examine to help
understand the different ways in which the ML-Agents toolkit can be used. These
environments can also serve as templates for new environments or as ways to test
new ML algorithms. After reading this tutorial, you should be able to explore
train the example environments.
If you are not familiar with the [Unity Engine](https://unity3d.com/unity), we
highly recommend the [Roll-a-ball
tutorial](https://unity3d.com/learn/tutorials/s/roll-ball-tutorial) to learn all
the basic concepts first.
This guide walks through the end-to-end process of opening one of our
[example environments](Learning-Environment-Examples.md) in Unity, training an
Agent in it, and embedding the trained model into the Unity environment. After
reading this tutorial, you should be able to train any of the example
environments. If you are not familiar with the
[Unity Engine](https://unity3d.com/unity), view our
[Background: Unity](Background-Unity.md) page for helpful pointers.
Additionally, if you're not familiar with machine learning, view our
[Background: Machine Learning](Background-Machine-Learning.md) page for a brief
overview and helpful pointers.
This guide uses the **3D Balance Ball** environment to teach the basic concepts and
usage patterns of ML-Agents. 3D Balance Ball
contains a number of agent cubes and balls (which are all copies of each other).
Each agent cube tries to keep its ball from falling by rotating either
horizontally or vertically. In this environment, an agent cube is an **Agent** that
receives a reward for every step that it balances the ball. An agent is also
penalized with a negative reward for dropping the ball. The goal of the training
process is to have the agents learn to balance the ball on their head.
For this guide, we'll use the **3D Balance Ball** environment which contains a
number of agent cubes and balls (which are all copies of each other). Each agent
cube tries to keep its ball from falling by rotating either horizontally or
vertically. In this environment, an agent cube is an **Agent** that receives a
reward for every step that it balances the ball. An agent is also penalized with
a negative reward for dropping the ball. The goal of the training process is to
have the agents learn to balance the ball on their head.
In order to install and set up the ML-Agents toolkit, the Python dependencies
and Unity, see the [installation instructions](Installation.md).
Depending on your version of Unity, it may be necessary to change the **Scripting Runtime Version** of your project. This can be done as follows:
If you haven't already, follow the [installation instructions](Installation.md).
Afterwards, open the Unity Project that contains all the example environments:
2. On the Projects dialog, choose the **Add** option at the top of the window.
3. Using the file dialog that opens, locate the `Project` folder
within the ML-Agents toolkit project and click **Open**.
4. Go to **Edit** > **Project Settings** > **Player**
5. For **each** of the platforms you target (**PC, Mac and Linux Standalone**,
**iOS** or **Android**):
1. Expand the **Other Settings** section.
2. Select **Scripting Runtime Version** to **Experimental (.NET 4.6
Equivalent or .NET 4.x Equivalent)**
6. Go to **File** > **Save Project**
1. On the Projects dialog, choose the **Add** option at the top of the window.
1. Using the file dialog that opens, locate the `Project` folder within the
ML-Agents Toolkit and click **Open**.
1. In the **Project** window, go to the
`Assets/ML-Agents/Examples/3DBall/Scenes` folder and open the `3DBall` scene
file.
_environment_. In the context of Unity, an environment is a scene containing
one or more Agent objects, and, of course, the other
entities that an agent interacts with.
_environment_. In the context of Unity, an environment is a scene containing one
or more Agent objects, and, of course, the other entities that an agent
interacts with.
![Unity Editor](images/mlagents-3DBallHierarchy.png)

window. The Inspector shows every component on a GameObject.
The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several agent cubes. Each agent cube in the scene is an
independent agent, but they all share the same Behavior. 3D Balance Ball does this
to speed up training since all twelve agents contribute to training in parallel.
it contains not one, but several agent cubes. Each agent cube in the scene is an
independent agent, but they all share the same Behavior. 3D Balance Ball does
this to speed up training since all twelve agents contribute to training in
parallel.
### Agent

behavior:
* **Behavior Parameters** — Every Agent must have a Behavior. The Behavior
determines how an Agent makes decisions. More on Behavior Parameters in
the next section.
* **Max Step** — Defines how many simulation steps can occur before the Agent's
- **Behavior Parameters** — Every Agent must have a Behavior. The Behavior
determines how an Agent makes decisions.
- **Max Step** — Defines how many simulation steps can occur before the Agent's
When you create an Agent, you must extend the base Agent class.
The Ball3DAgent subclass defines the following methods:
* `Agent.OnEpisodeBegin()` — Called at the beginning of an Agent's episode, including at the beginning
of the simulation. The Ball3DAgent class uses this function to reset the
agent cube and ball to their starting positions. The function randomizes the reset values so that the
training generalizes to more than a specific starting position and agent cube
attitude.
* `Agent.CollectObservations(VectorSensor sensor)` — Called every simulation step. Responsible for
collecting the Agent's observations of the environment. Since the Behavior
Parameters of the Agent are set with vector observation
space with a state size of 8, the `CollectObservations(VectorSensor sensor)` must call
`VectorSensor.AddObservation()` such that vector size adds up to 8.
* `Agent.OnActionReceived()` — Called every time the Agent receives an action to take. Receives the action chosen
by the Agent. The vector action spaces result in a
small change in the agent cube's rotation at each step. The `OnActionReceived()` method
assigns a reward to the Agent; in this example, an Agent receives a small
positive reward for each step it keeps the ball on the agent cube's head and a larger,
negative reward for dropping the ball. An Agent's episode is also ended when it
drops the ball so that it will reset with a new ball for the next simulation
step.
* `Agent.Heuristic()` - When the `Behavior Type` is set to `Heuristic Only` in the Behavior
Parameters of the Agent, the Agent will use the `Heuristic()` method to generate
the actions of the Agent. As such, the `Heuristic()` method takes an array of
floats. In the case of the Ball 3D Agent, the `Heuristic()` method converts the
keyboard inputs into actions.
#### Behavior Parameters : Vector Observation Space
Before making a decision, an agent collects its observation about its state in

The Behavior Parameters of the 3D Balance Ball example uses a **Space Size** of 8.
This means that the feature
vector containing the Agent's observations contains eight elements: the `x` and
`z` components of the agent cube's rotation and the `x`, `y`, and `z` components
of the ball's relative position and velocity. (The observation values are
defined in the Agent's `CollectObservations(VectorSensor sensor)` method.)
The Behavior Parameters of the 3D Balance Ball example uses a `Space Size` of 8.
This means that the feature vector containing the Agent's observations contains
eight elements: the `x` and `z` components of the agent cube's rotation and the
`x`, `y`, and `z` components of the ball's relative position and velocity.
An Agent is given instructions in the form of a float array of *actions*.
ML-Agents toolkit classifies actions into two types: the **Continuous** vector
action space is a vector of numbers that can vary continuously. What each
element of the vector means is defined by the Agent logic (the training
process just learns what values are better given particular state observations
based on the rewards received when it tries different values). For example, an
element might represent a force or torque applied to a `Rigidbody` in the Agent.
The **Discrete** action vector space defines its actions as tables. An action
given to the Agent is an array of indices into tables.
The 3D Balance Ball example is programmed to use continuous action
space with `Space Size` of 2.
An Agent is given instructions in the form of a float array of _actions_.
ML-Agents Toolkit classifies actions into two types: continuous and discrete.
The 3D Balance Ball example is programmed to use continuous action space which
is a a vector of numbers that can vary continuously. More specifically, it uses
a `Space Size` of 2 to control the amount of `x` and `z` rotations to apply to
itself to keep the ball balanced on its head.
[Unity Inference Engine](Unity-Inference-Engine.md) to run these models
inside Unity. In this section, we will use the pre-trained model for the
3D Ball example.
[Unity Inference Engine](Unity-Inference-Engine.md) to run these models inside
Unity. In this section, we will use the pre-trained model for the 3D Ball
example.
1. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Scenes` folder
and open the `3DBall` scene file.
2. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Prefabs` folder.
Expand `3DBall` and click on the `Agent` prefab. You should see the `Agent` prefab in the **Inspector** window.
1. In the **Project** window, go to the
`Assets/ML-Agents/Examples/3DBall/Prefabs` folder. Expand `3DBall` and click
on the `Agent` prefab. You should see the `Agent` prefab in the **Inspector**
window.
**Note**: The platforms in the `3DBall` scene were created using the `3DBall` prefab. Instead of updating all 12 platforms individually, you can update the `3DBall` prefab instead.
**Note**: The platforms in the `3DBall` scene were created using the `3DBall`
prefab. Instead of updating all 12 platforms individually, you can update the
`3DBall` prefab instead.
3. In the **Project** window, drag the **3DBall** Model located in
`Assets/ML-Agents/Examples/3DBall/TFModels` into the `Model` property under `Behavior Parameters (Script)` component in the Agent GameObject **Inspector** window.
1. In the **Project** window, drag the **3DBall** Model located in
`Assets/ML-Agents/Examples/3DBall/TFModels` into the `Model` property under
`Behavior Parameters (Script)` component in the Agent GameObject
**Inspector** window.
4. You should notice that each `Agent` under each `3DBall` in the **Hierarchy** windows now contains **3DBall** as `Model` on the `Behavior Parameters`. __Note__ : You can modify multiple game objects in a scene by selecting them all at
once using the search bar in the Scene Hierarchy.
8. Select the **InferenceDevice** to use for this model (CPU or GPU) on the Agent.
_Note: CPU is faster for the majority of ML-Agents toolkit generated models_
9. Click the **Play** button and you will see the platforms balance the balls
using the pre-trained model.
1. You should notice that each `Agent` under each `3DBall` in the **Hierarchy**
windows now contains **3DBall** as `Model` on the `Behavior Parameters`.
**Note** : You can modify multiple game objects in a scene by selecting them
all at once using the search bar in the Scene Hierarchy.
1. Set the **Inference Device** to use for this model as `CPU`.
1. Click the :arrow_forward: button in the Unity Editor and you will see the
platforms balance the balls using the pre-trained model.
While we provide pre-trained `.nn` files for the agents in this environment, any environment you make yourself will require training agents from scratch to generate a new model file. We can do this using reinforcement learning.
In order to train an agent to correctly balance the ball, we provide two
deep reinforcement learning algorithms.
The default algorithm is Proximal Policy Optimization (PPO). This
is a method that has been shown to be more general purpose and stable
than many other RL algorithms. For more information on PPO, OpenAI
has a [blog post](https://blog.openai.com/openai-baselines-ppo/)
explaining it, and [our page](Training-PPO.md) for how to use it in training.
We also provide Soft-Actor Critic, an off-policy algorithm that
has been shown to be both stable and sample-efficient.
For more information on SAC, see UC Berkeley's
[blog post](https://bair.berkeley.edu/blog/2018/12/14/sac/) and
[our page](Training-SAC.md) for more guidance on when to use SAC vs. PPO. To
use SAC to train Balance Ball, replace all references to `config/trainer_config.yaml`
with `config/sac_trainer_config.yaml` below.
To train the agents within the Balance Ball environment, we will be using the
ML-Agents Python package. We have provided a convenient command called `mlagents-learn`
which accepts arguments used to configure both training and inference phases.
While we provide pre-trained `.nn` files for the agents in this environment, any
environment you make yourself will require training agents from scratch to
generate a new model file. In this section we will demonstrate how to use the
reinforcement learning algorithms that are part of the ML-Agents Python package
to accomplish this. We have provided a convenient command `mlagents-learn` which
accepts arguments used to configure both training and inference phases.
2. Navigate to the folder where you cloned the ML-Agents toolkit repository.
**Note**: If you followed the default [installation](Installation.md), then
you should be able to run `mlagents-learn` from any directory.
3. Run `mlagents-learn <trainer-config-path> --run-id=<run-identifier>`
where:
- `<trainer-config-path>` is the relative or absolute filepath of the
trainer configuration. The defaults used by example environments included
in `MLAgentsSDK` can be found in `config/trainer_config.yaml`.
- `<run-identifier>` is a string used to separate the results of different
training runs. Make sure to use one that hasn't been used already!
4. If you cloned the ML-Agents repo, then you can simply run
```sh
mlagents-learn config/trainer_config.yaml --run-id=firstRun
```
5. When the message _"Start training by pressing the Play button in the Unity
1. Navigate to the folder where you cloned the `ml-agents` repository. **Note**:
If you followed the default [installation](Installation.md), then you should
be able to run `mlagents-learn` from any directory.
1. Run `mlagents-learn config/trainer_config.yaml --run-id=first3DBallRun`.
- `config/trainer_config.yaml` is the path to a default training
configuration file that we provide. In includes training configurations for
all our example environments, including 3DBall.
- `run-id` is a unique name for this training session.
1. When the message _"Start training by pressing the Play button in the Unity
**Note**: If you're using Anaconda, don't forget to activate the ml-agents
environment first.
The `--time-scale=100` sets the `Time.TimeScale` value in Unity.
**Note**: You can train using an executable rather than the Editor. To do so,
follow the instructions in
[Using an Executable](Learning-Environment-Executable.md).
**Note**: Re-running this command will start training from scratch again. To resume
a previous training run, append the `--load` flag and give the same `--run-id` as the
run you want to resume.
If `mlagents-learn` runs correctly and starts training, you should see something
like this:

sequence_length: 64
summary_freq: 1000
use_recurrent: False
summary_path: ./summaries/first-run-0
summary_path: ./summaries/first3DBallRun
model_path: ./models/first-run-0/3DBallLearning
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 4000. Mean Reward: 2.151. Std of Reward: 1.432. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 5000. Mean Reward: 3.175. Std of Reward: 2.250. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 6000. Mean Reward: 4.898. Std of Reward: 4.019. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 7000. Mean Reward: 6.716. Std of Reward: 5.125. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 8000. Mean Reward: 12.124. Std of Reward: 11.929. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 9000. Mean Reward: 18.151. Std of Reward: 16.871. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 10000. Mean Reward: 27.284. Std of Reward: 28.667. Training.
model_path: ./models/first3DBallRun/3DBallLearning
INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training.
INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training.
INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training.
INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 4000. Mean Reward: 2.151. Std of Reward: 1.432. Training.
INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 5000. Mean Reward: 3.175. Std of Reward: 2.250. Training.
INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 6000. Mean Reward: 4.898. Std of Reward: 4.019. Training.
INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 7000. Mean Reward: 6.716. Std of Reward: 5.125. Training.
INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 8000. Mean Reward: 12.124. Std of Reward: 11.929. Training.
INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 9000. Mean Reward: 18.151. Std of Reward: 16.871. Training.
INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 10000. Mean Reward: 27.284. Std of Reward: 28.667. Training.
Note how the `Mean Reward` value printed to the screen increases as training
progresses. This is a positive sign that training is succeeding.
### Observing Training Progress

tensorboard --logdir=summaries
```
Then navigate to `localhost:6006` in your browser.
From TensorBoard, you will see the summary statistics:
* **Lesson** - only interesting when performing [curriculum
training](Training-Curriculum-Learning.md). This is not used in the 3D Balance
Ball environment.
* **Cumulative Reward** - The mean cumulative episode reward over all agents. Should
increase during a successful training session.
* **Entropy** - How random the decisions of the model are. Should slowly decrease
during a successful training process. If it decreases too quickly, the `beta`
hyperparameter should be increased.
* **Episode Length** - The mean length of each episode in the environment for all
agents.
* **Learning Rate** - How large a step the training algorithm takes as it searches
for the optimal policy. Should decrease over time.
* **Policy Loss** - The mean loss of the policy function update. Correlates to how
much the policy (process for deciding actions) is changing. The magnitude of
this should decrease during a successful training session.
* **Value Estimate** - The mean value estimate for all states visited by the agent.
Should increase during a successful training session.
* **Value Loss** - The mean loss of the value function update. Correlates to how
well the model is able to predict the value of each state. This should
decrease during a successful training session.
Then navigate to `localhost:6006` in your browser to view the TensorBoard
summary statistics as shown below. For the purposes of this section, the most
important statistic is `Environment/Cumulative Reward` which should increase
throughout training, eventually converging close to `100` which is the maximum
reward the agent can accumulate.
![Example TensorBoard Run](images/mlagents-TensorBoard.png)

(denoted by the `Saved Model` message) you can add it to the Unity project and
use it with compatible Agents (the Agents that generated the model).
__Note:__ Do not just close the Unity Window once the `Saved Model` message appears.
use it with compatible Agents (the Agents that generated the model). **Note:**
Do not just close the Unity Window once the `Saved Model` message appears.
command-line prompt. If you close the window manually, the `.nn` file
containing the trained model is not exported into the ml-agents folder.
command-line prompt. If you close the window manually, the `.nn` file containing
the trained model is not exported into the ml-agents folder.
If you've quit the training early using Ctrl+C and want to resume training, run the
same command again, appending the `--resume` flag:
If you've quit the training early using Ctrl+C and want to resume training, run
the same command again, appending the `--resume` flag:
mlagents-learn config/trainer_config.yaml --run-id=firstRun --resume
mlagents-learn config/trainer_config.yaml --run-id=first3DBallRun --resume
`<behavior_name>` is the name of the `Behavior Name` of the agents corresponding to the model.
(**Note:** There is a known bug on Windows that causes the saving of the model to
fail when you early terminate the training, it's recommended to wait until Step
has reached the max_steps parameter you set in trainer_config.yaml.) This file
corresponds to your model's latest checkpoint. You can now embed this trained
model into your Agents by following the steps below, which is similar to
the steps described
[above](#running-a-pre-trained-model).
`<behavior_name>` is the name of the `Behavior Name` of the agents corresponding
to the model. This file corresponds to your model's latest checkpoint. You can
now embed this trained model into your Agents by following the steps below,
which is similar to the steps described [above](#running-a-pre-trained-model).
2. Open the Unity Editor, and select the **3DBall** scene as described above.
3. Select the **3DBall** prefab Agent object.
4. Drag the `<behavior_name>.nn` file from the Project window of
the Editor to the **Model** placeholder in the **Ball3DAgent**
inspector window.
5. Press the :arrow_forward: button at the top of the Editor.
1. Open the Unity Editor, and select the **3DBall** scene as described above.
1. Select the **3DBall** prefab Agent object.
1. Drag the `<behavior_name>.nn` file from the Project window of the Editor to
the **Model** placeholder in the **Ball3DAgent** inspector window.
1. Press the :arrow_forward: button at the top of the Editor.
- For more information on the ML-Agents toolkit, in addition to helpful
- For more information on the ML-Agents Toolkit, in addition to helpful
check out the [Making a New Learning
Environment](Learning-Environment-Create-New.md) page.
check out the
[Making a New Learning Environment](Learning-Environment-Create-New.md) page.
- For an overview on the more complex example environments that are provided in
this toolkit, check out the
[Example Environments](Learning-Environment-Examples.md) page.
- For more information on the various training options available, check out the
[Training ML-Agents](Training-ML-Agents.md) page.

120
docs/Installation-Anaconda-Windows.md


# Installing ML-Agents Toolkit for Windows (Deprecated)
Note: We no longer use this guide ourselves and so it may not work correctly. We've decided to
keep it up just in case it is helpful to you.
:warning: **Note:** We no longer use this guide ourselves and so it may not work
correctly. We've decided to keep it up just in case it is helpful to you.
The ML-Agents toolkit supports Windows 10. While it might be possible to run the
ML-Agents toolkit using other versions of Windows, it has not been tested on

[Download](https://www.anaconda.com/download/#windows) and install Anaconda for
Windows. By using Anaconda, you can manage separate environments for different
distributions of Python. Python 3.6.1 or higher is required as we no longer support
Python 2. In this guide, we are using Python version 3.6 and Anaconda version
5.1
distributions of Python. Python 3.6.1 or higher is required as we no longer
support Python 2. In this guide, we are using Python version 3.6 and Anaconda
version 5.1
([64-bit](https://repo.continuum.io/archive/Anaconda3-5.1.0-Windows-x86_64.exe)
or [32-bit](https://repo.continuum.io/archive/Anaconda3-5.1.0-Windows-x86.exe)
direct links).

<img src="images/anaconda_default.PNG" alt="Anaconda Install" width="500" border="10" />
</p>
After installation, you must open __Anaconda Navigator__ to finish the setup.
After installation, you must open **Anaconda Navigator** to finish the setup.
From the Windows search bar, type _anaconda navigator_. You can close Anaconda
Navigator after it opens.

Type `environment variables` in the search bar (this can be reached by hitting
the Windows key or the bottom left Windows button). You should see an option
called __Edit the system environment variables__.
called **Edit the system environment variables**.
<p align="center">
<img src="images/edit_env_var.png"

From here, click the __Environment Variables__ button. Double click "Path" under
__System variable__ to edit the "Path" variable, click __New__ to add the
From here, click the **Environment Variables** button. Double click "Path" under
**System variable** to edit the "Path" variable, click **New** to add the
following new paths.
```console

install these Python dependencies.
If you haven't already, clone the ML-Agents Toolkit Github repository to your
local computer. You can do this using Git ([download
here](https://git-scm.com/download/win)) and running the following commands in
an Anaconda Prompt _(if you open a new prompt, be sure to activate the ml-agents
Conda environment by typing `activate ml-agents`)_:
local computer. You can do this using Git
([download here](https://git-scm.com/download/win)) and running the following
commands in an Anaconda Prompt _(if you open a new prompt, be sure to activate
the ml-agents Conda environment by typing `activate ml-agents`)_:
The `--branch latest_release` option will switch to the tag of the latest stable release.
Omitting that will get the `master` branch which is potentially unstable.
The `--branch latest_release` option will switch to the tag of the latest stable
release. Omitting that will get the `master` branch which is potentially
unstable.
The `com.unity.ml-agents` subdirectory contains the core code to add to your projects.
The `Project` subdirectory contains many [example environments](Learning-Environment-Examples.md)
to help you get started.
The `com.unity.ml-agents` subdirectory contains the core code to add to your
projects. The `Project` subdirectory contains many
[example environments](Learning-Environment-Examples.md) to help you get
started.
The `ml-agents` subdirectory contains a Python package which provides deep reinforcement
learning trainers to use with Unity environments.
The `ml-agents` subdirectory contains a Python package which provides deep
reinforcement learning trainers to use with Unity environments.
The `ml-agents-envs` subdirectory contains a Python API to interface with Unity, which
the `ml-agents` package depends on.
The `ml-agents-envs` subdirectory contains a Python API to interface with Unity,
which the `ml-agents` package depends on.
Keep in mind where the files were downloaded, as you will need the
trainer config files in this directory when running `mlagents-learn`.
Make sure you are connected to the Internet and then type in the Anaconda
Prompt:
Keep in mind where the files were downloaded, as you will need the trainer
config files in this directory when running `mlagents-learn`. Make sure you are
connected to the Internet and then type in the Anaconda Prompt:
```console
pip install mlagents

the ML-Agents toolkit.
Sometimes on Windows, when you use pip to install certain Python packages, the pip will get stuck when trying to read the cache of the package. If you see this, you can try:
Sometimes on Windows, when you use pip to install certain Python packages, the
pip will get stuck when trying to read the cache of the package. If you see
this, you can try:
```console
pip install mlagents --no-cache-dir

### Installing for Development
If you intend to make modifications to `ml-agents` or `ml-agents-envs`, you should install
the packages from the cloned repo rather than from PyPi. To do this, you will need to install
`ml-agents` and `ml-agents-envs` separately.
If you intend to make modifications to `ml-agents` or `ml-agents-envs`, you
should install the packages from the cloned repo rather than from PyPi. To do
this, you will need to install `ml-agents` and `ml-agents-envs` separately.
cloned or downloaded the files, from the Anaconda Prompt, change to the ml-agents
subdirectory inside the ml-agents directory:
cloned or downloaded the files, from the Anaconda Prompt, change to the
ml-agents subdirectory inside the ml-agents directory:
```console
cd C:\Downloads\ml-agents

pip install -e .
```
Running pip with the `-e` flag will let you make changes to the Python files directly and have those
reflected when you run `mlagents-learn`. It is important to install these packages in this order as the
`mlagents` package depends on `mlagents_envs`, and installing it in the other
order will download `mlagents_envs` from PyPi.
Running pip with the `-e` flag will let you make changes to the Python files
directly and have those reflected when you run `mlagents-learn`. It is important
to install these packages in this order as the `mlagents` package depends on
`mlagents_envs`, and installing it in the other order will download
`mlagents_envs` from PyPi.
## (Optional) Step 4: GPU Training using The ML-Agents Toolkit

Additionally, you will need to check if your GPU is CUDA compatible. Please
check Nvidia's page [here](https://developer.nvidia.com/cuda-gpus).
Currently for the ML-Agents toolkit, only CUDA v9.0 and cuDNN v7.0.5 is supported.
Currently for the ML-Agents toolkit, only CUDA v9.0 and cuDNN v7.0.5 is
supported.
### Install Nvidia CUDA toolkit

this guide, we are using version
[9.0.176](https://developer.nvidia.com/compute/cuda/9.0/Prod/network_installers/cuda_9.0.176_win10_network-exe)).
Before installing, please make sure you __close any running instances of Unity
or Visual Studio__.
Before installing, please make sure you **close any running instances of Unity
or Visual Studio**.
Run the installer and select the Express option. Note the directory where you
installed the CUDA toolkit. In this guide, we installed in the directory

</p>
Once you've signed up, go back to the cuDNN
[downloads page](https://developer.nvidia.com/cudnn).
You may or may not be asked to fill out a short survey. When you get to the list
cuDNN releases, __make sure you are downloading the right version for the CUDA
toolkit you installed in Step 1.__ In this guide, we are using version 7.0.5 for
CUDA toolkit version 9.0
[downloads page](https://developer.nvidia.com/cudnn). You may or may not be
asked to fill out a short survey. When you get to the list cuDNN releases,
**make sure you are downloading the right version for the CUDA toolkit you
installed in Step 1.** In this guide, we are using version 7.0.5 for CUDA
toolkit version 9.0
([direct link](https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.5/prod/9.0_20171129/cudnn-9.0-windows10-x64-v7)).
After you have downloaded the cuDNN files, you will need to extract the files

To set the environment variable, type `environment variables` in the search bar
(this can be reached by hitting the Windows key or the bottom left Windows
button). You should see an option called __Edit the system environment
variables__.
button). You should see an option called **Edit the system environment
variables**.
<p align="center">
<img src="images/edit_env_var.png"

From here, click the __Environment Variables__ button. Click __New__ to add a
new system variable _(make sure you do this under __System variables__ and not
From here, click the **Environment Variables** button. Click **New** to add a
new system variable _(make sure you do this under **System variables** and not
User variables_.
<p align="center">

</p>
For __Variable Name__, enter `CUDA_HOME`. For the variable value, put the
For **Variable Name**, enter `CUDA_HOME`. For the variable value, put the
is `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0`. Press __OK__ once.
is `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0`. Press **OK** once.
<p align="center">
<img src="images/system_variable_name_value.PNG"

To set the two path variables, inside the same __Environment Variables__ window
and under the second box called __System Variables__, find a variable called
`Path` and click __Edit__. You will add two directories to the list. For this
To set the two path variables, inside the same **Environment Variables** window
and under the second box called **System Variables**, find a variable called
`Path` and click **Edit**. You will add two directories to the list. For this
guide, the two entries would look like:
```console

Next, install `tensorflow-gpu` using `pip`. You'll need version 1.7.1. In an
Anaconda Prompt with the Conda environment ml-agents activated, type in the
following command to uninstall TensorFlow for cpu and install TensorFlow
for gpu _(make sure you are connected to the Internet)_:
following command to uninstall TensorFlow for cpu and install TensorFlow for gpu
_(make sure you are connected to the Internet)_:
```sh
pip uninstall tensorflow

Lastly, you should test to see if everything installed properly and that
TensorFlow can identify your GPU. In the same Anaconda Prompt, open Python
in the Prompt by calling:
TensorFlow can identify your GPU. In the same Anaconda Prompt, open Python in