* made BrainParameters a class to set default values
Modified the error message if the state is discrete
* Add discrete state support to PPO and provide discrete state example environment
* Add flexibility to continuous control as well
* Finish PPO flexible model generation implementation
* Fix formatting
* Support color observations
* Add best practices document
* bug fix for non square observations
* Update Readme.md
* Remove scipy dependency
* Add installation doc
* added broadcast to the player and heuristic brain.
Allows the python API to record actions taken along with the states and rewards
* removed the broadcast checkbox
Added a Handshake method for the communicator
The academy will try to handshake regardless of the brains present
Player and Heuristic brains will send their information through the communicator but will not receive commands
* bug fix : The environment only requests actions from external brains when unique
* added warning in case no brins are set to external
* fix on the instanciation of coreBrains,
fix on the conversion of actions to arrays in the BrainInfo received from step
* default discrete action is now 0
bug fix for discrete broadcast action (the action size should be one in Agents.cs)
modified Tennis so that the default action is no action
modified the TemplateDecsion.cs to ensure non null values are sent from Decide() and MakeMemory()
* minor fixes
* need to convert the s...
* More efficiently allocate memory when sending states
* Code clean-up
* Additional changes
* More GC reduction
* Remove state list initialization from example environments
* Use built-in json tool to serialize state message
* Remove commented code
* Use more efficient CompareTag
* Comments before code
* Use type inference where appropriate
Greatly simplified GridWorld code. It now also only uses a visual observation rather than state vector in order to demonstrate learning purely from a visual input.
* `learn.py` is now main script for training brains.
* Simultaneous multi-brain training is now possible.
* `ghost-trainer` allows for proper training in adversarial scenarios.
* `imitation-trainer` provides a basic implementation of real-time behavioral cloning.
* All trainer hyperparameters now exist in `.yaml` files.
* `PPO.ipynb` removed.
* LSTM model added.
* More dynamic buffer class to handle greater variety of scenarios.
* Add support for stacking past n states to allow network to learn temporal dependencies.
* Add Banana Collector environment for demonstrating partially observable multi-agent environments.
* Add 3DBall Hard which lacks velocity information in state representation. Used as test for LSTM and state-stacking features.
* Rework Tennis environment to be continuous control and trainable in 100k steps.
* added the method write text to trainer so it is easy to write log the hyperparameters as a dictionary. Note: needs tensorflow version r1.2 or above
* added message if impossible to write text summary in Tensorboard
Replaced the print statements with logging statements in the exception.py file
Uses the same logger as the environment one
named the logger unityagents
* Add ability to seed learning (numpy, tensorflow, and Unity) with `--seed` flag.
* Add `maxStepReached` flag to Agents and Academy.
* Change way value bootstrapping works in PPO to take advantage of timeouts.
* Default size of GridWorld changed to 5x5 in order to validate bootstrapping changes.
* Implement behavioral cloning for cc/dc, fc/rnn, state/observations.
* Re-organize folder structure in anticipation of unitytrainers as a package.
* Create demo environment BananaImitation to validate behavioral cloning.
* Fixes#336
* Reorganized python tests into separate folder, and make individiual test files for different (sub) modules.
* Add tests for trainer_controller, PPO, and behavioral cloning. More to come soon.
* Minor bug fixes discovered while writing tests.
* Reworked GirdWorld to reset much faster.
* Cleaned ObservationToTex and reworked GetObservationMatrixList to be 3x faster.
* Fix Basic environment to properly reflect number of states.
* Fix discrete states when using stacked states.
* Add trained model for Basic environment.
* On Demand Decision : Use RequestDecision and RequestAction
* New Agent Inspector : Use it to set On Demand Decision
* New BrainParameters interface
* LSTM memory size is now set in python
* New C# API
* Semantic Changes
* Replaced RunMDP
* New Bouncer Environment to test On Demand Dscision
* [Previous Text Actions] Renamed previous_action to previous_vector_action
added previous_text_action to the BrainInfo
* [Semantics] Carried the modifications to the semantics of previous_vector_action to the trainers
* Add config for crawler, and change crawler scene
* Changed number of crawlers in scene to 12
* Changed Max-steps for crawlers to 5000
* Newer hyperparameters and newly trained crawler model
* Clean up crawler code, and improve efficency
* [New Bouncer] Revamped the Bouncer to be in 3D
* [Bouncer Configuration file] Added the BouncerBrain configuration
* [Documentation] Added the Bouncer tot he documentation page
* [Fixes] Fixed lines too long and the documentation typo
* Slight adjustments to bouncer environment
* Don't default to internal brain on bouncer
RayPerception moved to a component that is now used by Banana, Soccer, Hallway, and Push Block.
Converted Push Block to use RayPerception for local perception and retrained model.
Re-worked Hallway to be more extensible.
* Fixes internal brain for Banana Imitation.
* Fixes Discrete Control training for Imitation Learning.
* Fixes Visual Observations in internal brain with non-square inputs.
Fixes the following issues:
* Missing component reference in BananaRL environment.
* Neural Network for multiple visual observations was not properly generated.
* Episode time-out value estimate bootstrapping used incorrect observation as input.
This PR makes the following changes:
* Moves clipping of continuous control model into model itself. Output is now always [-1, 1].
* Internal model values are now clipped between [-3, 3] before being rescaled to [-1, 1] for output. * This improves training performance by providing a wider range of values within which the pdf of the gaussian can fall. Output of [-1, 1] is used to be more environment-creator friendly.
* Fixes issue where epsilon was erroneously being used to reconstruct old probabilities during PPO update, leading to reduced learning performance.
* Introduce ScaleAction() function within python to easily rescale values from [-1, 1] to arbitrary range.
* Re-train all CC models using improved algorithm. All performance levels are equal or improved. In the case of Crawler, improvement is drastic.
* Update documentation appropriately.
* Made miscellaneous minor code style and optimization improvements within environments.
* [Cold Fix] Split the way cummulative rewards and episode length are counted
The reward is appended at each step to the cummulative reward
The episode count is ONLY incremented when d_t+1 is false
Fixes the issue raised by @hsaikia in #552
Added the memory_size variable to the BC model
Added memory_size and recurrent_out to the output nodes of the graph when using BC with LSTM
* some random change so that I can create this PR
* docs update for TensorFlowSharp new version
* changed the links to the new unitypackage file
* resolved conflicts, updated the pictures for CUDA 9.0
* fixed a typo
* resolved arthur's comment
* blurred the usernames
* modified the AWS doc
* resolved Vince's comment
* [containers] Enables container support for scenes that use visual observations
* [Initial Commit] Works only with simple balance ball
* [Optimiztion] Store the academy in the brainBatcher as a temporary measure
* [Modifications] Made it work from the editor as a prototype
* [Made socket communicator and reimplmented all functionalities]
* [Forgotten file] removed .meta file
* [Forgot the meta file]
* [Metafile] deleted metafile
* [Comments] Removed dead code
* [Comments] Added some descriptions
* [Bug Fix] Multi brain scenario
* [improved AgentInfo converter]
* [Optimization] Remove VectorObs since StackedVectorObs is present in the AgentInfo protobuf object
* [Timeout] Implemented a timeout for the rpc communicator in Unity
* [Libraries] Added the C# Protobuf and Grpc libraries
* [Requirements] Added protobuf 3.5.2 to the requirements
* [Code Formating] Removed dead code and split some lines
...
* Adds implementation of Curiosity-driven Exploration by Self-supervised Prediction (https://arxiv.org/abs/1705.05363) to PPO trainer.
* To enable, set use_curiosity flag to true in hyperparameter file.
* Includes refactor of unitytrainers model code to accommodate new feature.
* Adds new Pyramids environment (w/ documentation). Environment contains sparse reward, and can only be solved using PPO+Curiosity.
In the case the agent is done imediately after spawning, its stats are empty because the stats need at least 2 successive experieces to create the stats.
By specifying the default value of 0, the error does no longer appear
* Revamps agent code for walker and crawler environments to use shared JointDriveController system.
* Crawler has been reworked to be very cute.
* Crawler & Walker environments have been reworked to be visually consistent.
* Added Dynamic Crawler scene.
* All scenes re-trained and new models added.
* Documentation changes.
* Added missing declaration to docs sample code.
* Added pretrained model as default graph in Internal brain of Tennis scene
* Disabled PlayerBrain in Tennis by default.
* Removed accidental config.
- Raises MetaCurriculumError when curriculum_folder is not a folder.
- Removed the ability to set curriculum_folder to None.
trainer_controller.py has been refactored to not depend on this
functionality which will make curriculums more stable.
- The old Curriculum object would accept None
as a location for the curriculum. If the
location was None, it would return default
values as its config and lesson number.
- The new MetaCurriculum does not accept
None as a location for the curriculum
folder. This was done to remove unnecessary
edge case functionality from curriculums.
- None checks have been added into
trainer_controller. In the future,
it should be possible to better refactor
trainer_controller so that these None
checks can be removed. This is preferable
to hard-coding default behavior into
MetaCurriculum objects when a metacurriculum
would not even be in place.
* [Initial Commit]
Modified the model.py file and the ppo/trainer.py file to use masked actions
* Preliminary modifications to the python side of the code to enable action masking
* Preliminary modifications to the C# side of the code to enable action masking
* Preliminary modifications to the communication side of the code to enable action masking
* Implemented action masking for BC
Note : The actions of the teacher are not masked
* More error messages for the action masking
* fix pytests
* Added Documentation
* Address comment
* Addressed Comments on docs
* Addressed second comment on docs
* Addressed comments for the python side of the code
* Created the action masker and associated unit tests
* Addressed comments on the C# side
* Addressed the comment regarding action_masking_name
* Addressed the comments