* Adds implementation of Curiosity-driven Exploration by Self-supervised Prediction (https://arxiv.org/abs/1705.05363) to PPO trainer.
* To enable, set use_curiosity flag to true in hyperparameter file.
* Includes refactor of unitytrainers model code to accommodate new feature.
* Adds new Pyramids environment (w/ documentation). Environment contains sparse reward, and can only be solved using PPO+Curiosity.
* 1 to 1 Brain to Agent
This is a work in progess
In this PR :
- Deleted all Brain Objects
- Moved the BrainParameters into the Agent
- Gave the Agent a Heuristic method (see Balance Ball for example)
- Modified the Communicator and ModelRunner : Put can only take one agent at a time
- Made the IBrain Interface with RequestDecision and DecideAction method
No changes made to Python
[Design Doc](https://docs.google.com/document/d/1hBhBxZ9lepGF4H6fc6Hu6AW7UwOmnyX3trmgI3HpOmo/edit#)
* Removing editorconfig
* Updating BallanceBall scene
* grammar mistake
* Clearing the Agents of the Model runner
* Added Documentation on IBrain
* Modified comments on GiveModel
* Introduced a factory
* Split Learning Brain in two
* Changes to walljump
* Fixing the Unit tests
* Renaming the Brain to Policy
* Heuristic now has priority over training
* Edited code comments
* Fixing bugs
* Develop one to one scene edits...
* Triming some of the methods of the agent but left SetReward
* Fixing bugs
* modifying the environments
* Reintroducing IsDone and IsMaxStepReached
* Updating the Migrating doc
* more details on the Migration
* Add the VectorSensor to the CollectObservation call
* Example of API change for BalanceBall
* Modified the Examples
* Changes to the migrating doc
* Editing the docs
* Update docs/Learning-Environment-Design-Agents.md
Co-Authored-By: Chris Elion <chris.elion@unity3d.com>
* Update docs/Migrating.md
Co-Authored-By: Chris Elion <chris.elion@unity3d.com>
* Update docs/Migrating.md
Co-Authored-By: Chris Elion <chris.elion@unity3d.com>
* Update docs/Getting-Started-with-Balance-Ball.md
Co-Authored-By: Chris Elion <chris.elion@unity3d.com>
* addressing comments
* Removed the MLAgents.Sensor namespace
* Removing the MLAgents.Sensor namespace from the tests
* Editing the migrating docs
Co-authored-by: Chris Elion <celion@gmail.com>
* [skip ci] Renamed methods in the Agent class
WARNING, the user when implementing obsolete methods will see the message :Member `old method` overrides obsolete member `old method`. Add the Obsolete attribute to `old method`. It will not suggest the new method to override.
* [skip ci] Updated the example environment
* [skip ci] Updated migrating and changelog
* [skip ci] Editing the docs
* [skip ci] Missing docs
* :+1
* Update docs/Getting-Started-with-Balance-Ball.md
Co-Authored-By: Chris Elion <chris.elion@unity3d.com>
* Update docs/Learning-Environment-Create-New.md
Co-Authored-By: Chris Elion <chris.elion@unity3d.com>
* Update docs/Learning-Environment-Create-New.md
Co-Authored-By: Chris Elion <chris.elion@unity3d.com>
* [skip ci] documentation changes
* [skip ci] Update docs/Getting-Started-with-Balance-Ball.md
* [skip ci] Update docs/Getting-Started-with-Balance-Ball.md
* [skip ci] Update docs/Gett...
Initial commit
second commit. The no-extrinsic was trained without the log reward (reward = prob) while the new one is (reward = log_prob - log_prior)
A few results, it looks like Walker-diverse-r05-bigger.onnx is doing something
Modified pushblock
using next state and action. Did not help
Fixing bug that had 9 diversity settings instead of 8
removing results