* Adds implementation of Curiosity-driven Exploration by Self-supervised Prediction (https://arxiv.org/abs/1705.05363) to PPO trainer.
* To enable, set use_curiosity flag to true in hyperparameter file.
* Includes refactor of unitytrainers model code to accommodate new feature.
* Adds new Pyramids environment (w/ documentation). Environment contains sparse reward, and can only be solved using PPO+Curiosity.
* [containers] Enables container support for scenes that use visual observations
* [Initial Commit] Works only with simple balance ball
* [Optimiztion] Store the academy in the brainBatcher as a temporary measure
* [Modifications] Made it work from the editor as a prototype
* [Made socket communicator and reimplmented all functionalities]
* [Forgotten file] removed .meta file
* [Forgot the meta file]
* [Metafile] deleted metafile
* [Comments] Removed dead code
* [Comments] Added some descriptions
* [Bug Fix] Multi brain scenario
* [improved AgentInfo converter]
* [Optimization] Remove VectorObs since StackedVectorObs is present in the AgentInfo protobuf object
* [Timeout] Implemented a timeout for the rpc communicator in Unity
* [Libraries] Added the C# Protobuf and Grpc libraries
* [Requirements] Added protobuf 3.5.2 to the requirements
* [Code Formating] Removed dead code and split some lines
...
* some random change so that I can create this PR
* docs update for TensorFlowSharp new version
* changed the links to the new unitypackage file
* resolved conflicts, updated the pictures for CUDA 9.0
* fixed a typo
* resolved arthur's comment
* blurred the usernames
* modified the AWS doc
* resolved Vince's comment
* [Refactor] Fixed line indentation
* Removed the library Newtonsoft.Json from the monitor
* Replaced calls to JSON converstion with manual conversion
* [Modified] The Monitor now has multiple
* Log methods that take different object types
- Indent the section about providing actions to multiple brains to be in line with the rest of the step() docs.
- Move the line about what step() returns closer to the top of the docs so it's harder to overlook.
- Add a small code snippet about how to get BrainInfo belonging to a specific brain and how to get data from that BrainInfo object.
Fixes the issue raised by @hsaikia in #552
Added the memory_size variable to the BC model
Added memory_size and recurrent_out to the output nodes of the graph when using BC with LSTM
* First draft of Azure support docs
* Correcting links to other docs
* Adding additional links and cleaning instructions
* Adding references to Azure docs in other appropriate places
* [Cold Fix] Split the way cummulative rewards and episode length are counted
The reward is appended at each step to the cummulative reward
The episode count is ONLY incremented when d_t+1 is false
This PR makes the following changes:
* Moves clipping of continuous control model into model itself. Output is now always [-1, 1].
* Internal model values are now clipped between [-3, 3] before being rescaled to [-1, 1] for output. * This improves training performance by providing a wider range of values within which the pdf of the gaussian can fall. Output of [-1, 1] is used to be more environment-creator friendly.
* Fixes issue where epsilon was erroneously being used to reconstruct old probabilities during PPO update, leading to reduced learning performance.
* Introduce ScaleAction() function within python to easily rescale values from [-1, 1] to arbitrary range.
* Re-train all CC models using improved algorithm. All performance levels are equal or improved. In the case of Crawler, improvement is drastic.
* Update documentation appropriately.
* Made miscellaneous minor code style and optimization improvements within environments.