Only cosmetic and readability improvements. No functional changes were intended.
Utilities.cs
- Fixed comments across file
- Made class static
- Removed unnecessary imports
- Removed unused method arguments
- Renamed variables as appropriate to make usage clearer
- In AddRangeNoAlloc, disabled (by comment) Rider’s suggestion to revert to use of built-in Range field (Fixed)
- In TextureToTensorProxy, swapped order of first two arguments to be more in-line with convention of input, output
UtilitiesTests.cs
- Removed unnecessary imports
- Simplified array creation commands
GeneratorImp.cs
- Rider automatically deleted spaces on empty lines
- Changed call to TextureToTensorProxy to mirror new argument ordering
* Clean-up to UnityAgentsException.cs
- Removed unnecessary imports
- Fixed comment warning
- Fixed method header
* Improvements to Startup.cs
- Created const for SCENE_NAME field
- Fixed strin...
* This addresses #1835. Baselines expects single environments used with their ppo2 algorithm to be wrapped in a DummyVecEnv. The old readme did not instruct the reader to do so and the code failed to run with the latest version of baselines. This imports the correct function from baselines and fixes the make_unity_env function described in the readme.
* added line to gym-unity/README.md to note the version of baselines the examples were tested with
* Add Soft Actor-Critic model, trainer, and policy and sac_trainer_config.yaml
* Add documentation for SAC and tweak PPO documentation to reference the new pages.
* Add tests for SAC, change simple_rl test to run both PPO and SAC.
In order for downstream packages to make use of the latest
pre-release features, we can pre-release versions of our packages.
For packages ending in `devN` pip will not install that package
version by default. This change manually updates our package version
to a development version with the idea that we can manually perform
development versions with the potential for future automated / nightly
dev releases.
* Adds evaluate_batch to reward signals. Evaluates on minibatch rather than on BrainInfo.
* Changes the way reward signal results are reported in rl_trainer so that we get the pure, unprocessed environment reward separate from the reward signals.
* Moves end_episode to rl_trainer
* Fixed bug with BCModule with RNN