- Move common functions to trainer.py, model.pyfromppo/trainer.py, ppo/policy.pyandppo/model.py'
- Introduce RLTrainer class and move most of add_experiences and some common reward
signal code there. PPO and SAC will inherit from this, not so much BC Trainer.
- Add methods to Buffer to enable sampling, truncating, and save/loading.
- Add scoping to create encoders in model.py
* Adds evaluate_batch to reward signals. Evaluates on minibatch rather than on BrainInfo.
* Changes the way reward signal results are reported in rl_trainer so that we get the pure, unprocessed environment reward separate from the reward signals.
* Moves end_episode to rl_trainer
* Fixed bug with BCModule with RNN
We have been ignoring unused imports and star imports via flake8. These are
both bad practice and grow over time without automated checking. This
commit attempts to fix all existing import errors and add back the corresponding
flake8 checks.
* Initial commit removing memories from C# and deprecating memory fields in proto
* initial changes to Python
* Adding functionalities
* Fixes
* adding the memories to the dictionary
* Fixing bugs
* tweeks
* Resolving bugs
* Recreating the proto
* Addressing comments
* Passing by reference does not work. Do not merge
* Fixing huge bug in Inference
* Applying patches
* fixing tests
* Addressing comments
* Renaming variable to reflect type
* test
* Modifying the .proto files
* attempt 1 at refactoring Python
* works for ppo hallway
* changing the documentation
* now works with both sac and ppo both training and inference
* Ned to fix the tests
* TODOs :
- Fix the demonstration recorder
- Fix the demonstration loader
- verify the intrinsic reward signals work
- Fix the tests on Python
- Fix the C# tests
* Regenerating the protos
* fix proto typo
* protos and modifying the C# demo recorder
* modified the demo loader
* Demos are loading
* IMPORTANT : THESE ARE THE FILES USED FOR CONVERSION FROM OLD TO NEW FORMAT
* Modified all the demo files
* Fixing all the tests
* fixing ci
* addressing comments
* removing reference to memories in the ll-api
This is the first in a series of PRs that intend to move the agent processing logic (add_experiences and process_experiences) out of the trainer and into a separate class. The plan is to do so in steps:
- Split the processing buffers (keeping track of agent trajectories and assembling trajectories) and update buffer (complete trajectories to be used for training) within the Trainer (this PR)
- Move the processing buffer and add/process experiences into a separate, outside class
- Change the data type of the update buffer to be a Trajectory
- Place and read Trajectories from queues, add subscription mechanism for both AgentProcessor and Trainers
* added team id and identifier concat to behavior parameters
* splitting brain params into brain name and identifiers
* set team id in prefab
* recieves brain_name and identifier on python side
* added team id and identifier concat to behavior parameters
* splitting brain params into brain name and identifiers
* set team id in prefab
* recieves brain_name and identifier on python side
* rebased with develop
* Correctly calls concatBehaviorIdentifiers
* added team id and identifier concat to behavior parameters
* splitting brain params into brain name and identifiers
* set team id in prefab
* recieves brain_name and identifier on python side
* rebased with develop
* Correctly calls concatBehaviorIdentifiers
* trainer_controller expects name_behavior_ids
* add_policy and create_policy separated
* adjusting tests to expect trainer.add_policy to be called
* fixing tests
* fixed naming ...
* Update Dockerfile
* Separate send environment data from reset (#4128)
* Fixed a typo on ML-Agents-Overview.md (#4130)
Fixed redundant "to" word from the sentence since it is probably a typo in document.
* Updated the badge’s link to point to the newest doc version
* Replaced all of the doc to release_3_doc
* Fix 3DBall and 3DBallHard SAC regressions (#4132)
* Move memory validation to settings
* Update docs
* Add settings test
* Update to release_3 in installation.md (#4144)
* rename to SideChannelManager +backcompat (#4137)
* Remove comment about logo with --help (#4148)
* [bugfix] Make FoodCollector heuristic playable (#4147)
* Make FoodCollector heuristic playable
* Update changelog
* script to check for old release links and references (#4153)
* Remove package validation suite from Project (#4146)
* RayPerceptionSensor: handle empty and invalid tags (#4155...
* Added Reward Providers for Torch
* Use NetworkBody to encode state in the reward providers
* Integrating the reward prodiders with ppo and torch
* work in progress, integration with PPO. Not training properly Pyramids at the moment
* Integration in PPO
* Removing duplicate file
* Gail and Curiosity working
* addressing comments
* Enfore float32 for tests
* enfore np.float32 in buffer
This change adds an export to .nn for each checkpoint generated by
RLTrainer and adds a NNCheckpointManager to track the generated
checkpoints and final model in training_status.json.
Co-authored-by: Jonathan Harper <jharper+moar@unity3d.com>
* Add torch_utils
* Use torch from torch_utils
* Add torch to banned modules in CI
* Better import error handling
* Fix flake8 errors
* Address comments
* Move networks to GPU if enabled
* Switch to torch_utils
* More flake8 problems
* Move reward providers to GPU/CPU
* Remove anothere set default tensor
* Fix banned import in test
* Moved components to the tf folder and moved the TrainerFactory to the `trainer` folder
* Addressing comments
* Editing the migrating doc
* fixing test
* Torch setup.py
* Set torch to default
* Make torch default in setup.py
* Remove indents
* Remove other instances of TF being used
* Add tensorboard to setup.py
* Adding correst setup commands for verifying torch is installed (#4524)
* Adding correst setup commands for verifying torch is installed
* Editing the test_requirments to add tf and remove torch
* Develop torchdefault raise outside setup (#4530)
* Torch not imported error to raise at first usage
* Torch not imported error to raise at first usage
* [refactor] Use PyTorch TensorBoard utils (#4518)
* Convert stats writer to use PyTorch TB support
* Use common function to print params
* Update test
* Bump tensorboard to 1.15 to fix the tests
* putting tensorboard 1.15.0 as min version requirement
Co-authored-by: vincentpierre <vincentpierre@unity3d.com>
* [Docs] Initial documentation changes for making...
* Fix end episode for POCA, add warning for group reward if not POCA (#5113)
* Fix end episode for POCA, add warning for group reward if not POCA
* Add missing imports
* Use np.any, which is faster
* Don't clear update buffer, but don't append to it either
* Update changelog
* Address comments
* Make experience replay buffer saving more verbose
(cherry picked from commit 63e7ad44d96b7663b91f005ca1d88f4f3b11dd2a)
* collecting latest step as a stat
* adding a list of hidden_keys to TB summarywriter to hide unnecessary stats from user
* fixing precommit
* fixing precommit
* formating
* defined the property types
* moving custom defaults to get_default_stats_writers
* new test for TensorboardWriter.hidden_keys
* improved testing
* explicit None evaluation
Co-authored-by: Ervin T. <ervin@unity3d.com>
* make hidden_keys optional
Co-authored-by: Ervin T. <ervin@unity3d.com>
* adding optional argument
* lowering the training threshold to 0.8 on test_var_len_obs_and_goal_poca
* Update pytest.yml
* Do not merge! droping pytest 3.9 job
* -add back pytest
-format imports and comments
* back to default threshold for test_var_len_obs_and_goal_poca
Co-authored-by: mahon94 <maryam.honari@unity3d.com>
Co-authored-by: Ervin T. <ervin@unity3d.com>