This change moves trainer initialization outside of TrainerController,
reducing some of the constructor arguments of TrainerController and
setting up the ability for trainers to be initialized in the case where
a TrainerController isn't needed.
This fixes an issue where stopping the game when training in the Editor won't end training, due to the new asynchronous SubprocessEnvManager changes. Another minor change was made to move the `env_manager.close()` in TrainerController to the end of `start_learning` so that we are more likely to save the model if something goes wrong during the environment shutdown (this occurs sometimes on Windows machines).
* Included explicit version # for ZN
* added explicit version for KR docs
* minor fix in installation doc
* Consistency with numbers for reset parameters
* Removed extra verbiage. minor consistency
* minor consistency
* Cleaned up IL language
* moved parameter sampling above in list
* Cleaned up language in Env Parameter sampling
* Cleaned up migrating content
* updated consistency of Reset Parameter Sampling
* Rename Training-Generalization-Learning.md to Training-Generalization-Reinforcement-Learning-Agents.md
* Updated doc link for generalization
* Rename Training-Generalization-Reinforcement-Learning-Agents.md to Training-Generalized-Reinforcement-Learning-Agents.md
* Re-wrote the intro paragraph for generalization
* add titles, cleaned up language for reset params
* Update Training-Generalized-Reinforcement-Learning-Agents.md
* cleanup of generalization doc
* More cleanu...
* add kor ver of README.md and empty docs, images
* add Installation.md translated to korean
* Fixed main readme docs and move all the English documents in the docs folder
* modify contents of 'Installation.md' and add kr version 'Installation-Windows.md'(not completed) with related image
* completed 1st translation of 'Installation-Windows.md' and added related images for korean docs
* add kr version 'Using-Docker.md'(not completed)
* translate Training-PPO.md to Korean
* Change word about epsilon in Training-PPO.md
* Fix Training PPO about epsilon
* completed korean translation of 'Using-Docker.md'
* Training Imitation Learning translation to Korean is finished! Also information about the translators are added
* modified all 'blogs.unity3d.com/' to 'blogs.unity3d.com/kr'
* removed all non-translated doc
* add translator information
* Removed obsolete 'TestDstWrongShape' test as it does not reflect how Barracuda tensors work
* Added proper test cleanup, to avoid warning messages from finalizer thread.
* Hotfix for recurrent + continous action nets in ML Agents
* Fix naming conventions for consistency
* Add generalization link to ML-Agents Overview
* Add generalization to main Readme
* Include types of samplers available for use
* Removed obsolete 'TestDstWrongShape' test as it does not reflect how Barracuda tensors work
* Added proper test cleanup, to avoid warning messages from finalizer thread.
* Add Sampler and SamplerManager
* Enable resampling of reset parameters during training
* Documentation for Sampler and example YAML configuration file
Bringing bucket of temp memory allocation optimizations:
* switched to Barracuda backed tensor across the board, helps to leverage allocators and reuse of the internal buffers
* added Barracuda 0.2.4 release, which bring another set of temp memory allocation fixes
* Timer proof-of-concept
* micro optimizations
* add some timers
* cleanup, add asserts
* Cleanup (no start/end methods) and handle exceptions
* unit test and decorator
* move output code, add a decorator
* cleanup
* module docstring
* actually write the timings when done with training
* use __qualname__ instead
* add a few more timers
* fix mock import
* fix unit test
* get timers from worker process (WIP)
* clean up timer merging
* typo
* WIP
* cleanup merging code
* bad merge
* undo accidental change
* remove reset command
* fix style
* fix unit tests
* fix unit tests (they got overwrote in merge)
* get timer root though a function
* timer around communicate
* Removes unused SubprocessEnvManager import in trainer_controller
* Removes unused `steps` argument to `TrainerController._save_model`
* Consolidates unnecessary branching for curricula in
`TrainerController.advance`
* Moves `reward_buffer` into `TFPolicy` from `PPOPolicy` and adds
`BCTrainer` support so that we don't have a broken interface /
undefined behavior when BCTrainer is used with curricula.
* Don't 0 value bootstrap for GAIL and Curiosity
* Add gradient penalties to GAN to help with stability
* Add gail_config.yaml with GAIL examples
* Cleaned up trainer_config.yaml and unnecessary gammas
* Documentation updates
* Code cleanup
SubprocessEnvManager takes steps synchronously to reproduce old
behavior, meaning all parallel environments will need to wait for
the slowest environment to take a step. If some steps take much
longer than others, this can lead to a substantial overall slowdown
in practice. We've seen extreme cases where we see almost a 2x
speedup from using asynchronous stepping, with no downside for our
faster environments. (Bouncer 16% improvement, Walker 14% improvement
in tests).
This PR changes the SubprocessEnvManager to use async stepping.
This means on the "step" call the environment manager will enqueue
step requests to workers, and then only wait until at least one
step has been completed before returning.