* Create new class (RewardSignal) that represents a reward signal.
* Add value heads for each reward signal in the PPO model.
* Make summaries agnostic to the type of reward signals, and log weighted rewards per reward signal.
* Move extrinsic and curiosity rewards into this new structure.
* Allow defining multiple reward signals in YAML file. Add documentation for this new structure.
Based on the new reward signals architecture, add BC pretrainer and GAIL for PPO. Main changes:
- A new GAILRewardSignal and GAILModel for GAIL/VAIL
- A BCModule component (not a reward signal) to do pretraining during RL
- Documentation for both of these
- Change to Demo Loader that lets you load multiple demo files in a folder
- Example Demo files for all of our tested sample environments (for future regression testing)
- Move common functions to trainer.py, model.pyfromppo/trainer.py, ppo/policy.pyandppo/model.py'
- Introduce RLTrainer class and move most of add_experiences and some common reward
signal code there. PPO and SAC will inherit from this, not so much BC Trainer.
- Add methods to Buffer to enable sampling, truncating, and save/loading.
- Add scoping to create encoders in model.py
* Adds evaluate_batch to reward signals. Evaluates on minibatch rather than on BrainInfo.
* Changes the way reward signal results are reported in rl_trainer so that we get the pure, unprocessed environment reward separate from the reward signals.
* Moves end_episode to rl_trainer
* Fixed bug with BCModule with RNN
We have been ignoring unused imports and star imports via flake8. These are
both bad practice and grow over time without automated checking. This
commit attempts to fix all existing import errors and add back the corresponding
flake8 checks.
* Feature Deprecation : Online Behavioral Cloning
In this PR :
- Delete the online_bc_trainer
- Delete the tests for online bc
- delete the configuration file for online bc training
* Deleting the BCTeacherHelper.cs Script
TODO :
- Remove usages in the scene
- Documentation Edits
*DO NOT MERGE*
* IMPORTANT : REMOVED ALL IL SCENES
- Removed all the IL scenes from the Examples folder
* Removed all mentions of online BC training in the Documentation
* Made a note in the Migrating.md doc about the removal of the Online BC feature.
* Modified the Academy UI to remove the control checkbox and replaced it with a train in the editor checkbox
* Removed the Broadcast functionality from the non-Learning brains
* Bug fix
* Note that the scenes are broken since the BroadcastHub has changed
* Modified the LL-API for Python to remove the broadcasting functiuonality.
* All unit tests are running
* Modifie...
* Add test for curiosity + SAC
* Use actions for all curiosity (need to test on PPO)
* Fix issue with reward signals updating multiple times
* Put curiosity actions in the right placeholder
* Test PPO curiosity update
* ISensor and SensorBase
* camera and rendertex first pass
* use isensors for visual obs
* Update gridworld with CameraSensors
* compressed obs for reals
* Remove AgentInfo.visualObservations
* better separation of train and inference sensor calls
* compressed obs proto - need CI to generate code
* int32
* get proto name right
* run protoc locally for new fiels
* apply generated proto patch (pyi files were weird)
* don't repeat bytes
* hook up compressedobs
* dont send BrainParameters until there's an AgentInfo
* python BrainParameters now needs an AgentInfo to create
* remove last (I hope) dependency on camerares
* remove CameraResolutions and AgentInfo.visual_observations
* update mypy-protobuf version
* cleanup todos
* python cleanup
* more unit test fixes
* more unit test fix
* camera sensors for VisualFood collector, record demo
* SensorCompon...
* Modifying the .proto files
* attempt 1 at refactoring Python
* works for ppo hallway
* changing the documentation
* now works with both sac and ppo both training and inference
* Ned to fix the tests
* TODOs :
- Fix the demonstration recorder
- Fix the demonstration loader
- verify the intrinsic reward signals work
- Fix the tests on Python
- Fix the C# tests
* Regenerating the protos
* fix proto typo
* protos and modifying the C# demo recorder
* modified the demo loader
* Demos are loading
* IMPORTANT : THESE ARE THE FILES USED FOR CONVERSION FROM OLD TO NEW FORMAT
* Modified all the demo files
* Fixing all the tests
* fixing ci
* addressing comments
* removing reference to memories in the ll-api
This is the first in a series of PRs that intend to move the agent processing logic (add_experiences and process_experiences) out of the trainer and into a separate class. The plan is to do so in steps:
- Split the processing buffers (keeping track of agent trajectories and assembling trajectories) and update buffer (complete trajectories to be used for training) within the Trainer (this PR)
- Move the processing buffer and add/process experiences into a separate, outside class
- Change the data type of the update buffer to be a Trajectory
- Place and read Trajectories from queues, add subscription mechanism for both AgentProcessor and Trainers
* [skip ci] WIP : Modify the base_env.py file
* [skip ci] typo
* [skip ci] renamed some methods
* [skip ci] Incorporated changes from our meeting
* [skip ci] everything is broken
* [skip ci] everything is broken
* [skip ci] formatting
* Fixing the gym tests
* Fixing bug, C# has an error that needs fixing
* Fixing the test
* relaxing the threshold of 0.99 to 0.9
* fixing the C# side
* formating
* Fixed the llapi integratio test
* [Increasing steps for testing]
* Fixing the python tests
* Need __contains__ after all
* changing the max_steps in the tests
* addressing comments
* Making env_manager logic clearer as proposed in the comments
* Remove duplicated logic and added back in episode length (#3728)
* removing mentions of multi-agent in gym and changed the docstring in base_env.py
* Edited the Documentation for the changes to the LLAPI (#3733)
* Edite...
* Update Dockerfile
* Separate send environment data from reset (#4128)
* Fixed a typo on ML-Agents-Overview.md (#4130)
Fixed redundant "to" word from the sentence since it is probably a typo in document.
* Updated the badge’s link to point to the newest doc version
* Replaced all of the doc to release_3_doc
* Fix 3DBall and 3DBallHard SAC regressions (#4132)
* Move memory validation to settings
* Update docs
* Add settings test
* Update to release_3 in installation.md (#4144)
* rename to SideChannelManager +backcompat (#4137)
* Remove comment about logo with --help (#4148)
* [bugfix] Make FoodCollector heuristic playable (#4147)
* Make FoodCollector heuristic playable
* Update changelog
* script to check for old release links and references (#4153)
* Remove package validation suite from Project (#4146)
* RayPerceptionSensor: handle empty and invalid tags (#4155...
* Make buffer type-agnostic
* Edit types of Apped method
* Change comment
* Collaborative walljump
* Make collab env harder
* Add group ID
* Add collab obs to trajectory
* Fix bug; add critic_obs to buffer
* Set group ids for some envs
* Pretty broken
* Less broken PPO
* Update SAC, fix PPO batching
* Fix SAC interrupted condition and typing
* Fix SAC interrupted again
* Remove erroneous file
* Fix multiple obs
* Update curiosity reward provider
* Update GAIL and BC
* Multi-input network
* Some minor tweaks but still broken
* Get next critic observations into value estimate
* Temporarily disable exporting
* Use Vince's ONNX export code
* Cleanup
* Add walljump collab YAML
* Lower max height
* Update prefab
* Update prefab
* Collaborative Hallway
* Set num teammates to 2
* Add config and group ids to HallwayCollab
* Fix bug with hallway collab
* E...