* `learn.py` is now main script for training brains.
* Simultaneous multi-brain training is now possible.
* `ghost-trainer` allows for proper training in adversarial scenarios.
* `imitation-trainer` provides a basic implementation of real-time behavioral cloning.
* All trainer hyperparameters now exist in `.yaml` files.
* `PPO.ipynb` removed.
* LSTM model added.
* More dynamic buffer class to handle greater variety of scenarios.
* Add support for stacking past n states to allow network to learn temporal dependencies.
* Add Banana Collector environment for demonstrating partially observable multi-agent environments.
* Add 3DBall Hard which lacks velocity information in state representation. Used as test for LSTM and state-stacking features.
* Rework Tennis environment to be continuous control and trainable in 100k steps.
* Add ability to seed learning (numpy, tensorflow, and Unity) with `--seed` flag.
* Add `maxStepReached` flag to Agents and Academy.
* Change way value bootstrapping works in PPO to take advantage of timeouts.
* Default size of GridWorld changed to 5x5 in order to validate bootstrapping changes.
* Implement behavioral cloning for cc/dc, fc/rnn, state/observations.
* Re-organize folder structure in anticipation of unitytrainers as a package.
* Create demo environment BananaImitation to validate behavioral cloning.
* Fixes#336
* Fix Basic environment to properly reflect number of states.
* Fix discrete states when using stacked states.
* Add trained model for Basic environment.
* On Demand Decision : Use RequestDecision and RequestAction
* New Agent Inspector : Use it to set On Demand Decision
* New BrainParameters interface
* LSTM memory size is now set in python
* New C# API
* Semantic Changes
* Replaced RunMDP
* New Bouncer Environment to test On Demand Dscision
* Add config for crawler, and change crawler scene
* Changed number of crawlers in scene to 12
* Changed Max-steps for crawlers to 5000
* Newer hyperparameters and newly trained crawler model
* Clean up crawler code, and improve efficency
* [New Bouncer] Revamped the Bouncer to be in 3D
* [Bouncer Configuration file] Added the BouncerBrain configuration
* [Documentation] Added the Bouncer tot he documentation page
* [Fixes] Fixed lines too long and the documentation typo
* Slight adjustments to bouncer environment
* Don't default to internal brain on bouncer
RayPerception moved to a component that is now used by Banana, Soccer, Hallway, and Push Block.
Converted Push Block to use RayPerception for local perception and retrained model.
Re-worked Hallway to be more extensible.
Fixes the following issues:
* Missing component reference in BananaRL environment.
* Neural Network for multiple visual observations was not properly generated.
* Episode time-out value estimate bootstrapping used incorrect observation as input.
This PR makes the following changes:
* Moves clipping of continuous control model into model itself. Output is now always [-1, 1].
* Internal model values are now clipped between [-3, 3] before being rescaled to [-1, 1] for output. * This improves training performance by providing a wider range of values within which the pdf of the gaussian can fall. Output of [-1, 1] is used to be more environment-creator friendly.
* Fixes issue where epsilon was erroneously being used to reconstruct old probabilities during PPO update, leading to reduced learning performance.
* Introduce ScaleAction() function within python to easily rescale values from [-1, 1] to arbitrary range.
* Re-train all CC models using improved algorithm. All performance levels are equal or improved. In the case of Crawler, improvement is drastic.
* Update documentation appropriately.
* Made miscellaneous minor code style and optimization improvements within environments.
* Adds implementation of Curiosity-driven Exploration by Self-supervised Prediction (https://arxiv.org/abs/1705.05363) to PPO trainer.
* To enable, set use_curiosity flag to true in hyperparameter file.
* Includes refactor of unitytrainers model code to accommodate new feature.
* Adds new Pyramids environment (w/ documentation). Environment contains sparse reward, and can only be solved using PPO+Curiosity.
* Revamps agent code for walker and crawler environments to use shared JointDriveController system.
* Crawler has been reworked to be very cute.
* Crawler & Walker environments have been reworked to be visually consistent.
* Added Dynamic Crawler scene.
* All scenes re-trained and new models added.
* Documentation changes.
* Added missing declaration to docs sample code.
* Added pretrained model as default graph in Internal brain of Tennis scene
* Disabled PlayerBrain in Tennis by default.
* Removed accidental config.
* updated the Pyramids model
* updated the pyramids model, chnaged the max_steps to reflect the new max steps required to achive ~1.8 cumulative reward
* Adding model for 3D Balance Ball.
* Adding LearningBrain to BroadCast Hub.
* Removed CrawlerPlayer Brain
* Renamed CrawlerLearning —> CrawlerStaticLearning
* Update Hallway models
* Attaching model to brain for Hallway
* Attaching model to 3DBall Brain.
* Updated CrawlerLearning —> CrawlerStaticLearning on trainer config.
* Adding Reacher model
* Remove model specification in Hallway Brain asset
* Removing model specification from 3Dball scene
* Adding crawler model file
* Specifying learning brain as default for crawler
* Create new class (RewardSignal) that represents a reward signal.
* Add value heads for each reward signal in the PPO model.
* Make summaries agnostic to the type of reward signals, and log weighted rewards per reward signal.
* Move extrinsic and curiosity rewards into this new structure.
* Allow defining multiple reward signals in YAML file. Add documentation for this new structure.
* Don't 0 value bootstrap for GAIL and Curiosity
* Add gradient penalties to GAN to help with stability
* Add gail_config.yaml with GAIL examples
* Cleaned up trainer_config.yaml and unnecessary gammas
* Documentation updates
* Code cleanup
* new env styles rebased on develop
* added new trained models
* renamed food collector platforms
* reduce training timescale on WallJump from 100 to 10
* uncheck academy control on walljump
* new banner image
* rename banner file
* new example env images
* add foodCollector image
* change Banana to FoodCollector and update image
* change bouncer description to include green cube
* update image
* update gridworld image
* cleanup prefab names and tags
* updated soccer env to reference purple agent instead of red
* remove unused mats
* rename files
* remove more unused tags
* update image
* change platform to agent cube
* update text. change platform to agents head
* cleanup
* cleaned up weird unused meta files
* add new wall jump nn files and rename a prefab
* walker change stacked states from 5 to 1
walker collects physics observations so stacked states are not need...
* 1 to 1 Brain to Agent
This is a work in progess
In this PR :
- Deleted all Brain Objects
- Moved the BrainParameters into the Agent
- Gave the Agent a Heuristic method (see Balance Ball for example)
- Modified the Communicator and ModelRunner : Put can only take one agent at a time
- Made the IBrain Interface with RequestDecision and DecideAction method
No changes made to Python
[Design Doc](https://docs.google.com/document/d/1hBhBxZ9lepGF4H6fc6Hu6AW7UwOmnyX3trmgI3HpOmo/edit#)
* Removing editorconfig
* Updating BallanceBall scene
* grammar mistake
* Clearing the Agents of the Model runner
* Added Documentation on IBrain
* Modified comments on GiveModel
* Introduced a factory
* Split Learning Brain in two
* Changes to walljump
* Fixing the Unit tests
* Renaming the Brain to Policy
* Heuristic now has priority over training
* Edited code comments
* Fixing bugs
* Develop one to one scene edits...