* `learn.py` is now main script for training brains.
* Simultaneous multi-brain training is now possible.
* `ghost-trainer` allows for proper training in adversarial scenarios.
* `imitation-trainer` provides a basic implementation of real-time behavioral cloning.
* All trainer hyperparameters now exist in `.yaml` files.
* `PPO.ipynb` removed.
* LSTM model added.
* More dynamic buffer class to handle greater variety of scenarios.
* Add support for stacking past n states to allow network to learn temporal dependencies.
* Add Banana Collector environment for demonstrating partially observable multi-agent environments.
* Add 3DBall Hard which lacks velocity information in state representation. Used as test for LSTM and state-stacking features.
* Rework Tennis environment to be continuous control and trainable in 100k steps.
* Implement behavioral cloning for cc/dc, fc/rnn, state/observations.
* Re-organize folder structure in anticipation of unitytrainers as a package.
* Create demo environment BananaImitation to validate behavioral cloning.
* Fixes#336
Fixes the following issues:
* Missing component reference in BananaRL environment.
* Neural Network for multiple visual observations was not properly generated.
* Episode time-out value estimate bootstrapping used incorrect observation as input.
- The old Curriculum object would accept None
as a location for the curriculum. If the
location was None, it would return default
values as its config and lesson number.
- The new MetaCurriculum does not accept
None as a location for the curriculum
folder. This was done to remove unnecessary
edge case functionality from curriculums.
- None checks have been added into
trainer_controller. In the future,
it should be possible to better refactor
trainer_controller so that these None
checks can be removed. This is preferable
to hard-coding default behavior into
MetaCurriculum objects when a metacurriculum
would not even be in place.
* Move 'take_action' into Policy class
This refactor is part of Actor-Trainer separation. Since policies
will be distributed across actors in separate processes which share
a single trainer, taking an action should be the responsibility of
the policy.
This change makes a few smaller changes:
* Combines `take_action` logic between trainers, making it more
generic
* Adds an `ActionInfo` data class to be more explicit about the
data returned by the policy, only used by TrainerController and
policy for now.
* Moves trainer stats logic out of `take_action` and into
`add_experiences`
* Renames 'take_action' to 'get_action'
* WIP precommit on top level
* update CI
* circleci fixes
* intentionally fail black
* use --show-diff-on-failure in CI
* fix command order
* rebreak a file
* apply black
* WIP enable mypy
* run mypy on each package
* fix trainer_metrics mypy errors
* more mypy errors
* more mypy
* Fix some partially typed functions
* types for take_action_outputs
* fix formatting
* cleanup
* generate stubs for proto objects
* fix ml-agents-env mypy errors
* disallow-incomplete-defs for gym-unity
* Add CI notes to CONTRIBUTING.md
* Create new class (RewardSignal) that represents a reward signal.
* Add value heads for each reward signal in the PPO model.
* Make summaries agnostic to the type of reward signals, and log weighted rewards per reward signal.
* Move extrinsic and curiosity rewards into this new structure.
* Allow defining multiple reward signals in YAML file. Add documentation for this new structure.
At each step, an unused `last_reward` variable in the TF graph is
updated in our PPO trainer. There are also related unused methods
in various places in the codebase. This change removes them.
Previously in v0.8 we added parallel environments via the
SubprocessUnityEnvironment, which exposed the same abstraction as
UnityEnvironment while actually wrapping many parallel environments
via subprocesses.
Wrapping many environments with the same interface as a single
environment had some downsides, however:
* Ordering needed to be preserved for agents across different envs,
complicating the SubprocessEnvironment logic
* Asynchronous environments with steps taken out of sync with the
trainer aren't viable with the Environment abstraction
This PR introduces a new EnvManager abstraction which exposes a
reduced subset of the UnityEnvironment abstraction and a
SubprocessEnvManager implementation which replaces the
SubprocessUnityEnvironment.
* Removes unused SubprocessEnvManager import in trainer_controller
* Removes unused `steps` argument to `TrainerController._save_model`
* Consolidates unnecessary branching for curricula in
`TrainerController.advance`
* Moves `reward_buffer` into `TFPolicy` from `PPOPolicy` and adds
`BCTrainer` support so that we don't have a broken interface /
undefined behavior when BCTrainer is used with curricula.
- Move common functions to trainer.py, model.pyfromppo/trainer.py, ppo/policy.pyandppo/model.py'
- Introduce RLTrainer class and move most of add_experiences and some common reward
signal code there. PPO and SAC will inherit from this, not so much BC Trainer.
- Add methods to Buffer to enable sampling, truncating, and save/loading.
- Add scoping to create encoders in model.py
We have been ignoring unused imports and star imports via flake8. These are
both bad practice and grow over time without automated checking. This
commit attempts to fix all existing import errors and add back the corresponding
flake8 checks.
* added team id and identifier concat to behavior parameters
* splitting brain params into brain name and identifiers
* set team id in prefab
* recieves brain_name and identifier on python side
* added team id and identifier concat to behavior parameters
* splitting brain params into brain name and identifiers
* set team id in prefab
* recieves brain_name and identifier on python side
* rebased with develop
* Correctly calls concatBehaviorIdentifiers
* added team id and identifier concat to behavior parameters
* splitting brain params into brain name and identifiers
* set team id in prefab
* recieves brain_name and identifier on python side
* rebased with develop
* Correctly calls concatBehaviorIdentifiers
* trainer_controller expects name_behavior_ids
* add_policy and create_policy separated
* adjusting tests to expect trainer.add_policy to be called
* fixing tests
* fixed naming ...
* [bug-fix] Increase height of wall in CrawlerStatic (#3650)
* [bug-fix] Improve performance for PPO with continuous actions (#3662)
* Corrected a typo in a name of a function (#3670)
OnEpsiodeBegin was corrected to OnEpisodeBegin in Migrating.md document
* Add Academy.AutomaticSteppingEnabled to migration (#3666)
* Fix editor port in Dockerfile (#3674)
* Hotfix memory leak on Python (#3664)
* Hotfix memory leak on Python
* Fixing
* Fixing a bug in the heuristic policy. A decision should not be requested when the agent is done
* [bug-fix] Make Python able to deal with 0-step episodes (#3671)
* adding some comments
Co-authored-by: Ervin T <ervin@unity3d.com>
* Remove vis_encode_type from list of required (#3677)
* Update changelog (#3678)
* Shorten timeout duration for environment close (#3679)
The timeout duration for closing an environment was set to the
same duration as the timeout when waiting ...
* [bug-fix] Fix regression in --initialize-from feature (#4086)
* Fixed text in GettingStarted page specifying the logdir for tensorboard. Before it was in a directory summaries which no longer existed. Results are now saved to the results dir. (#4085)
* [refactor] Remove nonfunctional `output_path` option from TrainerSettings (#4087)
* Reverting bug introduced in #4071 (#4101)
Co-authored-by: Scott <Scott.m.jordan91@gmail.com>
Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>
* Update Dockerfile
* Separate send environment data from reset (#4128)
* Fixed a typo on ML-Agents-Overview.md (#4130)
Fixed redundant "to" word from the sentence since it is probably a typo in document.
* Updated the badge’s link to point to the newest doc version
* Replaced all of the doc to release_3_doc
* Fix 3DBall and 3DBallHard SAC regressions (#4132)
* Move memory validation to settings
* Update docs
* Add settings test
* Update to release_3 in installation.md (#4144)
* rename to SideChannelManager +backcompat (#4137)
* Remove comment about logo with --help (#4148)
* [bugfix] Make FoodCollector heuristic playable (#4147)
* Make FoodCollector heuristic playable
* Update changelog
* script to check for old release links and references (#4153)
* Remove package validation suite from Project (#4146)
* RayPerceptionSensor: handle empty and invalid tags (#4155...
This change adds an export to .nn for each checkpoint generated by
RLTrainer and adds a NNCheckpointManager to track the generated
checkpoints and final model in training_status.json.
Co-authored-by: Jonathan Harper <jharper+moar@unity3d.com>