Based on the new reward signals architecture, add BC pretrainer and GAIL for PPO. Main changes:
- A new GAILRewardSignal and GAILModel for GAIL/VAIL
- A BCModule component (not a reward signal) to do pretraining during RL
- Documentation for both of these
- Change to Demo Loader that lets you load multiple demo files in a folder
- Example Demo files for all of our tested sample environments (for future regression testing)
* Don't 0 value bootstrap for GAIL and Curiosity
* Add gradient penalties to GAN to help with stability
* Add gail_config.yaml with GAIL examples
* Cleaned up trainer_config.yaml and unnecessary gammas
* Documentation updates
* Code cleanup
* Included explicit version # for ZN
* added explicit version for KR docs
* minor fix in installation doc
* Consistency with numbers for reset parameters
* Removed extra verbiage. minor consistency
* minor consistency
* Cleaned up IL language
* moved parameter sampling above in list
* Cleaned up language in Env Parameter sampling
* Cleaned up migrating content
* updated consistency of Reset Parameter Sampling
* Rename Training-Generalization-Learning.md to Training-Generalization-Reinforcement-Learning-Agents.md
* Updated doc link for generalization
* Rename Training-Generalization-Reinforcement-Learning-Agents.md to Training-Generalized-Reinforcement-Learning-Agents.md
* Re-wrote the intro paragraph for generalization
* add titles, cleaned up language for reset params
* Update Training-Generalized-Reinforcement-Learning-Agents.md
* cleanup of generalization doc
* More cleanu...
- Fix issue with BC Trainer `increment_steps`.
- Fix issue with Demonstration Recorder and visual observations (memory leak fix was deleting vis obs too early).
- Make Samplers sample from the same random seed every time, so generalization runs are repeatable.
- Fix crash when using GAIL, Curiosity, and visual observations together.
* Adds evaluate_batch to reward signals. Evaluates on minibatch rather than on BrainInfo.
* Changes the way reward signal results are reported in rl_trainer so that we get the pure, unprocessed environment reward separate from the reward signals.
* Moves end_episode to rl_trainer
* Fixed bug with BCModule with RNN
We have been ignoring unused imports and star imports via flake8. These are
both bad practice and grow over time without automated checking. This
commit attempts to fix all existing import errors and add back the corresponding
flake8 checks.
* Modifying the .proto files
* attempt 1 at refactoring Python
* works for ppo hallway
* changing the documentation
* now works with both sac and ppo both training and inference
* Ned to fix the tests
* TODOs :
- Fix the demonstration recorder
- Fix the demonstration loader
- verify the intrinsic reward signals work
- Fix the tests on Python
- Fix the C# tests
* Regenerating the protos
* fix proto typo
* protos and modifying the C# demo recorder
* modified the demo loader
* Demos are loading
* IMPORTANT : THESE ARE THE FILES USED FOR CONVERSION FROM OLD TO NEW FORMAT
* Modified all the demo files
* Fixing all the tests
* fixing ci
* addressing comments
* removing reference to memories in the ll-api
This is the first in a series of PRs that intend to move the agent processing logic (add_experiences and process_experiences) out of the trainer and into a separate class. The plan is to do so in steps:
- Split the processing buffers (keeping track of agent trajectories and assembling trajectories) and update buffer (complete trajectories to be used for training) within the Trainer (this PR)
- Move the processing buffer and add/process experiences into a separate, outside class
- Change the data type of the update buffer to be a Trajectory
- Place and read Trajectories from queues, add subscription mechanism for both AgentProcessor and Trainers
* Update Dockerfile
* Separate send environment data from reset (#4128)
* Fixed a typo on ML-Agents-Overview.md (#4130)
Fixed redundant "to" word from the sentence since it is probably a typo in document.
* Updated the badge’s link to point to the newest doc version
* Replaced all of the doc to release_3_doc
* Fix 3DBall and 3DBallHard SAC regressions (#4132)
* Move memory validation to settings
* Update docs
* Add settings test
* Update to release_3 in installation.md (#4144)
* rename to SideChannelManager +backcompat (#4137)
* Remove comment about logo with --help (#4148)
* [bugfix] Make FoodCollector heuristic playable (#4147)
* Make FoodCollector heuristic playable
* Update changelog
* script to check for old release links and references (#4153)
* Remove package validation suite from Project (#4146)
* RayPerceptionSensor: handle empty and invalid tags (#4155...
* Moved components to the tf folder and moved the TrainerFactory to the `trainer` folder
* Addressing comments
* Editing the migrating doc
* fixing test