SubprocessEnvManager takes steps synchronously to reproduce old
behavior, meaning all parallel environments will need to wait for
the slowest environment to take a step. If some steps take much
longer than others, this can lead to a substantial overall slowdown
in practice. We've seen extreme cases where we see almost a 2x
speedup from using asynchronous stepping, with no downside for our
faster environments. (Bouncer 16% improvement, Walker 14% improvement
in tests).
This PR changes the SubprocessEnvManager to use async stepping.
This means on the "step" call the environment manager will enqueue
step requests to workers, and then only wait until at least one
step has been completed before returning.
Fixes shuffling issue with newer versions of numpy (#1798).
* make get_value_estimates output a dict of floats
* Use np.append instead of convert to list, unconvert
* Add type hints and test for get_value_estimates
Based on the new reward signals architecture, add BC pretrainer and GAIL for PPO. Main changes:
- A new GAILRewardSignal and GAILModel for GAIL/VAIL
- A BCModule component (not a reward signal) to do pretraining during RL
- Documentation for both of these
- Change to Demo Loader that lets you load multiple demo files in a folder
- Example Demo files for all of our tested sample environments (for future regression testing)
* Using-Docker.md miss a backslash in 3DBall command
Hi,
Just a quick edit because a backslash seems to be missing from the 3DBall command example.
* Added interactive options and Tensorboard documentation for Docker training
* Timer proof-of-concept
* micro optimizations
* add some timers
* cleanup, add asserts
* Cleanup (no start/end methods) and handle exceptions
* unit test and decorator
* move output code, add a decorator
* cleanup
* module docstring
* actually write the timings when done with training
* use __qualname__ instead
* add a few more timers
* fix mock import
* fix unit test
* don't need fwd reference
* cleanup root
* always write timers, add comments
* undo accidental change
TrainerController depended on an external_brains dictionary with
brain params in its constructor but only used it in a single function
call. The same function call (start_learning) takes the environment
as an argument, which is the source of the external_brains.
This change removes the dependency of TrainerController on external
brains and removes the two class members related to external_brains
and retrieves the brains directly from the environment.