* This addresses #1835. Baselines expects single environments used with their ppo2 algorithm to be wrapped in a DummyVecEnv. The old readme did not instruct the reader to do so and the code failed to run with the latest version of baselines. This imports the correct function from baselines and fixes the make_unity_env function described in the readme.
* added line to gym-unity/README.md to note the version of baselines the examples were tested with
* Add Soft Actor-Critic model, trainer, and policy and sac_trainer_config.yaml
* Add documentation for SAC and tweak PPO documentation to reference the new pages.
* Add tests for SAC, change simple_rl test to run both PPO and SAC.
In order for downstream packages to make use of the latest
pre-release features, we can pre-release versions of our packages.
For packages ending in `devN` pip will not install that package
version by default. This change manually updates our package version
to a development version with the idea that we can manually perform
development versions with the potential for future automated / nightly
dev releases.
* Adds evaluate_batch to reward signals. Evaluates on minibatch rather than on BrainInfo.
* Changes the way reward signal results are reported in rl_trainer so that we get the pure, unprocessed environment reward separate from the reward signals.
* Moves end_episode to rl_trainer
* Fixed bug with BCModule with RNN
- Move common functions to trainer.py, model.pyfromppo/trainer.py, ppo/policy.pyandppo/model.py'
- Introduce RLTrainer class and move most of add_experiences and some common reward
signal code there. PPO and SAC will inherit from this, not so much BC Trainer.
- Add methods to Buffer to enable sampling, truncating, and save/loading.
- Add scoping to create encoders in model.py