* Add Soft Actor-Critic model, trainer, and policy and sac_trainer_config.yaml
* Add documentation for SAC and tweak PPO documentation to reference the new pages.
* Add tests for SAC, change simple_rl test to run both PPO and SAC.
* Normalize observations when adding experiences
This change moves normalization of vector observations into the trainer's
"add_experiences" interface.
Prior to this change, normalization occurred at inference time. This
was somewhat confusing since usually executing a forward pass shouldn't
have side-effects which would change the training step. Also, in a
asynchronous or distributed setting where we copy the neural network
weights from a trainer to a remote actor / inference worker we'd end up
with training issues because of the weights being different on the trainer
than the workers.
We have been ignoring unused imports and star imports via flake8. These are
both bad practice and grow over time without automated checking. This
commit attempts to fix all existing import errors and add back the corresponding
flake8 checks.
* Add test for curiosity + SAC
* Use actions for all curiosity (need to test on PPO)
* Fix issue with reward signals updating multiple times
* Put curiosity actions in the right placeholder
* Test PPO curiosity update
This is the first in a series of PRs that intend to move the agent processing logic (add_experiences and process_experiences) out of the trainer and into a separate class. The plan is to do so in steps:
- Split the processing buffers (keeping track of agent trajectories and assembling trajectories) and update buffer (complete trajectories to be used for training) within the Trainer (this PR)
- Move the processing buffer and add/process experiences into a separate, outside class
- Change the data type of the update buffer to be a Trajectory
- Place and read Trajectories from queues, add subscription mechanism for both AgentProcessor and Trainers
* added team id and identifier concat to behavior parameters
* splitting brain params into brain name and identifiers
* set team id in prefab
* recieves brain_name and identifier on python side
* added team id and identifier concat to behavior parameters
* splitting brain params into brain name and identifiers
* set team id in prefab
* recieves brain_name and identifier on python side
* rebased with develop
* Correctly calls concatBehaviorIdentifiers
* added team id and identifier concat to behavior parameters
* splitting brain params into brain name and identifiers
* set team id in prefab
* recieves brain_name and identifier on python side
* rebased with develop
* Correctly calls concatBehaviorIdentifiers
* trainer_controller expects name_behavior_ids
* add_policy and create_policy separated
* adjusting tests to expect trainer.add_policy to be called
* fixing tests
* fixed naming ...
* [bug-fix] Increase height of wall in CrawlerStatic (#3650)
* [bug-fix] Improve performance for PPO with continuous actions (#3662)
* Corrected a typo in a name of a function (#3670)
OnEpsiodeBegin was corrected to OnEpisodeBegin in Migrating.md document
* Add Academy.AutomaticSteppingEnabled to migration (#3666)
* Fix editor port in Dockerfile (#3674)
* Hotfix memory leak on Python (#3664)
* Hotfix memory leak on Python
* Fixing
* Fixing a bug in the heuristic policy. A decision should not be requested when the agent is done
* [bug-fix] Make Python able to deal with 0-step episodes (#3671)
* adding some comments
Co-authored-by: Ervin T <ervin@unity3d.com>
* Remove vis_encode_type from list of required (#3677)
* Update changelog (#3678)
* Shorten timeout duration for environment close (#3679)
The timeout duration for closing an environment was set to the
same duration as the timeout when waiting ...
* [bug-fix] Fix regression in --initialize-from feature (#4086)
* Fixed text in GettingStarted page specifying the logdir for tensorboard. Before it was in a directory summaries which no longer existed. Results are now saved to the results dir. (#4085)
* [refactor] Remove nonfunctional `output_path` option from TrainerSettings (#4087)
* Reverting bug introduced in #4071 (#4101)
Co-authored-by: Scott <Scott.m.jordan91@gmail.com>
Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>
* Update Dockerfile
* Separate send environment data from reset (#4128)
* Fixed a typo on ML-Agents-Overview.md (#4130)
Fixed redundant "to" word from the sentence since it is probably a typo in document.
* Updated the badge’s link to point to the newest doc version
* Replaced all of the doc to release_3_doc
* Fix 3DBall and 3DBallHard SAC regressions (#4132)
* Move memory validation to settings
* Update docs
* Add settings test
* Update to release_3 in installation.md (#4144)
* rename to SideChannelManager +backcompat (#4137)
* Remove comment about logo with --help (#4148)
* [bugfix] Make FoodCollector heuristic playable (#4147)
* Make FoodCollector heuristic playable
* Update changelog
* script to check for old release links and references (#4153)
* Remove package validation suite from Project (#4146)
* RayPerceptionSensor: handle empty and invalid tags (#4155...
* Added Reward Providers for Torch
* Use NetworkBody to encode state in the reward providers
* Integrating the reward prodiders with ppo and torch
* work in progress, integration with PPO. Not training properly Pyramids at the moment
* Integration in PPO
* Removing duplicate file
* Gail and Curiosity working
* addressing comments
* Enfore float32 for tests
* enfore np.float32 in buffer
This change adds an export to .nn for each checkpoint generated by
RLTrainer and adds a NNCheckpointManager to track the generated
checkpoints and final model in training_status.json.
Co-authored-by: Jonathan Harper <jharper+moar@unity3d.com>
* Fixes for Torch SAC and tests
* FIx recurrent sac test
* Properly update normalization for SAC-continuous
* Fix issue with log ent coef reporting in SAC Torch
* Add torch_utils
* Use torch from torch_utils
* Add torch to banned modules in CI
* Better import error handling
* Fix flake8 errors
* Address comments
* Move networks to GPU if enabled
* Switch to torch_utils
* More flake8 problems
* Move reward providers to GPU/CPU
* Remove anothere set default tensor
* Fix banned import in test
* Moved components to the tf folder and moved the TrainerFactory to the `trainer` folder
* Addressing comments
* Editing the migrating doc
* fixing test
* Torch setup.py
* Set torch to default
* Make torch default in setup.py
* Remove indents
* Remove other instances of TF being used
* Add tensorboard to setup.py
* Adding correst setup commands for verifying torch is installed (#4524)
* Adding correst setup commands for verifying torch is installed
* Editing the test_requirments to add tf and remove torch
* Develop torchdefault raise outside setup (#4530)
* Torch not imported error to raise at first usage
* Torch not imported error to raise at first usage
* [refactor] Use PyTorch TensorBoard utils (#4518)
* Convert stats writer to use PyTorch TB support
* Use common function to print params
* Update test
* Bump tensorboard to 1.15 to fix the tests
* putting tensorboard 1.15.0 as min version requirement
Co-authored-by: vincentpierre <vincentpierre@unity3d.com>
* [Docs] Initial documentation changes for making...
* Don't clear update buffer, but don't append to it either
* Update changelog
* Address comments
* Make experience replay buffer saving more verbose
(cherry picked from commit 63e7ad44d96b7663b91f005ca1d88f4f3b11dd2a)