* Implement behavioral cloning for cc/dc, fc/rnn, state/observations.
* Re-organize folder structure in anticipation of unitytrainers as a package.
* Create demo environment BananaImitation to validate behavioral cloning.
* Fixes#336
* Reorganized python tests into separate folder, and make individiual test files for different (sub) modules.
* Add tests for trainer_controller, PPO, and behavioral cloning. More to come soon.
* Minor bug fixes discovered while writing tests.
* Reworked GirdWorld to reset much faster.
* Cleaned ObservationToTex and reworked GetObservationMatrixList to be 3x faster.
* On Demand Decision : Use RequestDecision and RequestAction
* New Agent Inspector : Use it to set On Demand Decision
* New BrainParameters interface
* LSTM memory size is now set in python
* New C# API
* Semantic Changes
* Replaced RunMDP
* New Bouncer Environment to test On Demand Dscision
Fixes the following issues:
* Missing component reference in BananaRL environment.
* Neural Network for multiple visual observations was not properly generated.
* Episode time-out value estimate bootstrapping used incorrect observation as input.
Fixes the issue raised by @hsaikia in #552
Added the memory_size variable to the BC model
Added memory_size and recurrent_out to the output nodes of the graph when using BC with LSTM
* [containers] Enables container support for scenes that use visual observations
* [Initial Commit] Works only with simple balance ball
* [Optimiztion] Store the academy in the brainBatcher as a temporary measure
* [Modifications] Made it work from the editor as a prototype
* [Made socket communicator and reimplmented all functionalities]
* [Forgotten file] removed .meta file
* [Forgot the meta file]
* [Metafile] deleted metafile
* [Comments] Removed dead code
* [Comments] Added some descriptions
* [Bug Fix] Multi brain scenario
* [improved AgentInfo converter]
* [Optimization] Remove VectorObs since StackedVectorObs is present in the AgentInfo protobuf object
* [Timeout] Implemented a timeout for the rpc communicator in Unity
* [Libraries] Added the C# Protobuf and Grpc libraries
* [Requirements] Added protobuf 3.5.2 to the requirements
* [Code Formating] Removed dead code and split some lines
...
* Adds implementation of Curiosity-driven Exploration by Self-supervised Prediction (https://arxiv.org/abs/1705.05363) to PPO trainer.
* To enable, set use_curiosity flag to true in hyperparameter file.
* Includes refactor of unitytrainers model code to accommodate new feature.
* Adds new Pyramids environment (w/ documentation). Environment contains sparse reward, and can only be solved using PPO+Curiosity.
- Raises MetaCurriculumError when curriculum_folder is not a folder.
- Removed the ability to set curriculum_folder to None.
trainer_controller.py has been refactored to not depend on this
functionality which will make curriculums more stable.
- The old Curriculum object would accept None
as a location for the curriculum. If the
location was None, it would return default
values as its config and lesson number.
- The new MetaCurriculum does not accept
None as a location for the curriculum
folder. This was done to remove unnecessary
edge case functionality from curriculums.
- None checks have been added into
trainer_controller. In the future,
it should be possible to better refactor
trainer_controller so that these None
checks can be removed. This is preferable
to hard-coding default behavior into
MetaCurriculum objects when a metacurriculum
would not even be in place.
* Changing learn.py log messages.
- learn.py refers to the mlagents-learn script now.
- If a non-existant trainer config is passed, the log message
correctly points that out now.
* Changing the curriculum arg from file to dir.
* Fixing learn.py, trainer_controller.py, and Docker
- learn.py has been moved under trainers.
- this was a two line change
- learn.py will no longer be run as a main method
- docopt arguments are strings by default. learn.py now uses
this assumption to correctly parse arguments.
- trainer_controller.py now considers the Docker volume when
accepting a trainer config file path.
- the Docker container now uses mlagents-learn.
* Removing extraneous unity-volume ref.
* Documentation tweaks and updates (#1479)
* Add blurb about using the --load flag in the intro guide, and typo fix.
* Add section in tutorial to create multiple area learning environment.
* Add mention of Done() method in agent design
* fixed the windows ctrl-c bug
* fixed typo
* removed some uncessary printing
* nothing
* make the import of the win api conditional
* removved the duplicate code
* added the ability to use python debugger on ml-agents
* added newline at the end, changed the import to be complete path
* changed the info.log into policy.export_model, changed the sys.platform to use startswith
* fixed a bug
* remove the printing of the path
* tweaked the info message to notify the user about the expected error message
* removed some logging according to comments
* removed the sys import
* Revert "Documentation tweaks and updates (#1479)"
This reverts commit 84ef07a4525fa8a89f4...
* Remove env creation logic from TrainerController
Currently TrainerController includes logic related to creating the
UnityEnvironment, which causes poor separation of concerns between
the learn.py application script, TrainerController and UnityEnvironment:
* TrainerController must know about the proper way to instantiate the
UnityEnvironment, which may differ from application to application.
This also makes mocking or subclassing UnityEnvironment more
difficult.
* Many arguments are passed by learn.py to TrainerController and passed
along to UnityEnvironment.
This change moves environment construction logic into learn.py, as part
of the greater refactor to separate trainer logic from actor / environment.
* Switched default Mac GFX API to Metal
* Added Barracuda pre-0.1.5
* Added basic integration with Barracuda Inference Engine
* Use predefined outputs the same way as for TF engine
* Fixed discrete action + LSTM support
* Switch Unity Mac Editor to Metal GFX API
* Fixed null model handling
* All examples converted to support Barracuda
* Added model conversion from Tensorflow to Barracuda
copied the barracuda.py file to ml-agents/mlagents/trainers
copied the tensorflow_to_barracuda.py file to ml-agents/mlagents/trainers
modified the tensorflow_to_barracuda.py file so it could be called from mlagents
modified ml-agents/mlagents/trainers/policy.py to convert the tf models to barracuda compatible .bytes file
* Added missing iOS BLAS plugin
* Added forgotten prefab changes
* Removed GLCore GFX backend for Mac, because it doesn't support Compute shaders
* Exposed GPU support for LearningBrain inference
...
* Move 'take_action' into Policy class
This refactor is part of Actor-Trainer separation. Since policies
will be distributed across actors in separate processes which share
a single trainer, taking an action should be the responsibility of
the policy.
This change makes a few smaller changes:
* Combines `take_action` logic between trainers, making it more
generic
* Adds an `ActionInfo` data class to be more explicit about the
data returned by the policy, only used by TrainerController and
policy for now.
* Moves trainer stats logic out of `take_action` and into
`add_experiences`
* Renames 'take_action' to 'get_action'
This commit adds support for running Unity environments in parallel.
An abstract base class was created for UnityEnvironment which a new
SubprocessUnityEnvironment inherits from.
SubprocessUnityEnvironment communicates through a pipe in order to
send commands which will be run in parallel to its workers.
A few significant changes needed to be made as a side-effect:
* UnityEnvironments are created via a factory method (a closure)
rather than being directly created by the main process.
* In mlagents-learn "worker-id" has been replaced by "base-port"
and "num-envs", and worker_ids are automatically assigned across runs.
* BrainInfo objects now convert all fields to numpy arrays or lists to
avoid serialization issues.
On Windows the interrupt for subprocesses works in a different
way from OSX/Linux. The result is that child subprocesses and
their pipes may close while the parent process is still running
during a keyboard (ctrl+C) interrupt.
To handle this, this change adds handling for EOFError and
BrokenPipeError exceptions when interacting with subprocess
environments. Additional management is also added to be sure
when using parallel runs using the "num-runs" option that
the threads for each run are joined and KeyboardInterrupts are
handled.
These changes made the "_win_handler" we used to specially
manage interrupts on Windows unnecessary, so they have been
removed.
A change was made to the way the "train_mode" flag was used by
environments when SubprocessUnityEnvironment was added which was
intended to be part of a separate change set. This broke the CLI
'--slow' flag. This change undoes those changes, so that the slow
/ fast simulation option works correctly.
As a minor additional change, the remaining tests from top level
'tests' folders have been moved into the new test folders.
When using parallel SubprocessUnityEnvironment instances along
with Academy Done(), a new step might be taken when reset should
have been called because some environments may have been done while
others were not (making "global done" less useful).
This change manages the reset on `global_done` at the level of the
environment worker, and removes the global reset from
TrainerController.
* WIP precommit on top level
* update CI
* circleci fixes
* intentionally fail black
* use --show-diff-on-failure in CI
* fix command order
* rebreak a file
* apply black
* WIP enable mypy
* run mypy on each package
* fix trainer_metrics mypy errors
* more mypy errors
* more mypy
* Fix some partially typed functions
* types for take_action_outputs
* fix formatting
* cleanup
* generate stubs for proto objects
* fix ml-agents-env mypy errors
* disallow-incomplete-defs for gym-unity
* Add CI notes to CONTRIBUTING.md
At each step, an unused `last_reward` variable in the TF graph is
updated in our PPO trainer. There are also related unused methods
in various places in the codebase. This change removes them.
Previously in v0.8 we added parallel environments via the
SubprocessUnityEnvironment, which exposed the same abstraction as
UnityEnvironment while actually wrapping many parallel environments
via subprocesses.
Wrapping many environments with the same interface as a single
environment had some downsides, however:
* Ordering needed to be preserved for agents across different envs,
complicating the SubprocessEnvironment logic
* Asynchronous environments with steps taken out of sync with the
trainer aren't viable with the Environment abstraction
This PR introduces a new EnvManager abstraction which exposes a
reduced subset of the UnityEnvironment abstraction and a
SubprocessEnvManager implementation which replaces the
SubprocessUnityEnvironment.
TrainerController depended on an external_brains dictionary with
brain params in its constructor but only used it in a single function
call. The same function call (start_learning) takes the environment
as an argument, which is the source of the external_brains.
This change removes the dependency of TrainerController on external
brains and removes the two class members related to external_brains
and retrieves the brains directly from the environment.
* Timer proof-of-concept
* micro optimizations
* add some timers
* cleanup, add asserts
* Cleanup (no start/end methods) and handle exceptions
* unit test and decorator
* move output code, add a decorator
* cleanup
* module docstring
* actually write the timings when done with training
* use __qualname__ instead
* add a few more timers
* fix mock import
* fix unit test
* don't need fwd reference
* cleanup root
* always write timers, add comments
* undo accidental change
* Removes unused SubprocessEnvManager import in trainer_controller
* Removes unused `steps` argument to `TrainerController._save_model`
* Consolidates unnecessary branching for curricula in
`TrainerController.advance`
* Moves `reward_buffer` into `TFPolicy` from `PPOPolicy` and adds
`BCTrainer` support so that we don't have a broken interface /
undefined behavior when BCTrainer is used with curricula.
* Add Sampler and SamplerManager
* Enable resampling of reset parameters during training
* Documentation for Sampler and example YAML configuration file
This fixes an issue where stopping the game when training in the Editor won't end training, due to the new asynchronous SubprocessEnvManager changes. Another minor change was made to move the `env_manager.close()` in TrainerController to the end of `start_learning` so that we are more likely to save the model if something goes wrong during the environment shutdown (this occurs sometimes on Windows machines).
This change moves trainer initialization outside of TrainerController,
reducing some of the constructor arguments of TrainerController and
setting up the ability for trainers to be initialized in the case where
a TrainerController isn't needed.
We have been ignoring unused imports and star imports via flake8. These are
both bad practice and grow over time without automated checking. This
commit attempts to fix all existing import errors and add back the corresponding
flake8 checks.
* Feature Deprecation : Online Behavioral Cloning
In this PR :
- Delete the online_bc_trainer
- Delete the tests for online bc
- delete the configuration file for online bc training
* Deleting the BCTeacherHelper.cs Script
TODO :
- Remove usages in the scene
- Documentation Edits
*DO NOT MERGE*
* IMPORTANT : REMOVED ALL IL SCENES
- Removed all the IL scenes from the Examples folder
* Removed all mentions of online BC training in the Documentation
* Made a note in the Migrating.md doc about the removal of the Online BC feature.
* Modified the Academy UI to remove the control checkbox and replaced it with a train in the editor checkbox
* Removed the Broadcast functionality from the non-Learning brains
* Bug fix
* Note that the scenes are broken since the BroadcastHub has changed
* Modified the LL-API for Python to remove the broadcasting functiuonality.
* All unit tests are running
* Modified the scen...
* [WIP] Side Channel initial layout
* Working prototype for raw bytes
* fixing format mistake
* Added some errors and some unit tests in C#
* Added the side channel for the Engine Configuration. (#2958)
* Added the side channel for the Engine Configuration.
Note that this change does not require modifying a lot of files :
- Adding a sender in Python
- Adding a receiver in C#
- subscribe the receiver to the communicator (here is a one liner in the Academy)
- Add the side channel to the Python UnityEnvironment (not represented here)
Adding the side channel to the environment would look like such :
```python
from mlagents.envs.environment import UnityEnvironment
from mlagents.envs.side_channel.raw_bytes_channel import RawBytesChannel
from mlagents.envs.side_channel.engine_configuration_channel import EngineConfigurationChannel
channel0 = RawBytesChannel()
channel1 = EngineConfigurationChanne...
* added team id and identifier concat to behavior parameters
* splitting brain params into brain name and identifiers
* set team id in prefab
* recieves brain_name and identifier on python side
* added team id and identifier concat to behavior parameters
* splitting brain params into brain name and identifiers
* set team id in prefab
* recieves brain_name and identifier on python side
* rebased with develop
* Correctly calls concatBehaviorIdentifiers
* added team id and identifier concat to behavior parameters
* splitting brain params into brain name and identifiers
* set team id in prefab
* recieves brain_name and identifier on python side
* rebased with develop
* Correctly calls concatBehaviorIdentifiers
* trainer_controller expects name_behavior_ids
* add_policy and create_policy separated
* adjusting tests to expect trainer.add_policy to be called
* fixing tests
* fixed naming ...
Previously the Curriculum and MetaCurriculum classes required file / folder
paths for initialization. These methods loaded the configuration for the
curricula from the filesystem. Requiring files for configuring curricula
makes testing and updating our config format more difficult.
This change moves the file loading into static methods, so that Curricula /
MetaCurricula can be initialized from dictionaries only.
This PR makes it so that the env_manager only sends one current BrainInfo and the previous actions (if any) to the AgentManager. The list of agents was added to the ActionInfo and used appropriately.
This PR moves the AgentManagers from the TrainerController into the env_manager. This way, the TrainerController only needs to create the components (Trainers, AgentManagers) and call advance() on the EnvManager and the Trainers.
In the previous PR, steps were processed when the env manager was reset. This was an issue for the very first reset, where we don't actually know which agent groups (and AgentManagers) we needed to send the steps to. These steps were being thrown away.
This PR moves the processing of steps to advance(), so that the initial reset steps are simply processed when the next advance(). This also removes the need for an additional block of code in TrainerController to handle the initial reset.
* [bug-fix] Increase height of wall in CrawlerStatic (#3650)
* [bug-fix] Improve performance for PPO with continuous actions (#3662)
* Corrected a typo in a name of a function (#3670)
OnEpsiodeBegin was corrected to OnEpisodeBegin in Migrating.md document
* Add Academy.AutomaticSteppingEnabled to migration (#3666)
* Fix editor port in Dockerfile (#3674)
* Hotfix memory leak on Python (#3664)
* Hotfix memory leak on Python
* Fixing
* Fixing a bug in the heuristic policy. A decision should not be requested when the agent is done
* [bug-fix] Make Python able to deal with 0-step episodes (#3671)
* adding some comments
Co-authored-by: Ervin T <ervin@unity3d.com>
* Remove vis_encode_type from list of required (#3677)
* Update changelog (#3678)
* Shorten timeout duration for environment close (#3679)
The timeout duration for closing an environment was set to the
same duration as the timeout when waiting ...
This commit surfaces exceptions from environment worker subprocesses,
and changes the SubprocessEnvManager to raise those exceptions when
caught. Additionally TrainerController was changed to treat environment
exceptions differently than KeyboardInterrupts. We now raise the
environment exceptions after exporting the model, so that ML-Agents will
correctly exit with a non-zero return code.
* Update Dockerfile
* Separate send environment data from reset (#4128)
* Fixed a typo on ML-Agents-Overview.md (#4130)
Fixed redundant "to" word from the sentence since it is probably a typo in document.
* Updated the badge’s link to point to the newest doc version
* Replaced all of the doc to release_3_doc
* Fix 3DBall and 3DBallHard SAC regressions (#4132)
* Move memory validation to settings
* Update docs
* Add settings test
* Update to release_3 in installation.md (#4144)
* rename to SideChannelManager +backcompat (#4137)
* Remove comment about logo with --help (#4148)
* [bugfix] Make FoodCollector heuristic playable (#4147)
* Make FoodCollector heuristic playable
* Update changelog
* script to check for old release links and references (#4153)
* Remove package validation suite from Project (#4146)
* RayPerceptionSensor: handle empty and invalid tags (#4155...
* Introduced the Constant Parameter Sampler that will be useful later as samplers and floats can be used interchangeably
* Refactored the settings.py to refect the new format of the config.yaml
* First working version
* Added the unit tests
* Update to Upgrade for Updates
* fixing the tests
* Upgraded the config files
* Fixes
* Additional error catching
* addressing some comments
* Making the code nicer with cattr
* Added and registered an unstructure hook for PrameterRandomization
* Updating C# Walljump
* Adding comments
* Add test for settings export (#4164)
* Add test for settings export
* Update ml-agents/mlagents/trainers/tests/test_settings.py
Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>
Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>
* Including environment parameters for the test for settings export
* First documentation up...
This change adds an export to .nn for each checkpoint generated by
RLTrainer and adds a NNCheckpointManager to track the generated
checkpoints and final model in training_status.json.
Co-authored-by: Jonathan Harper <jharper+moar@unity3d.com>
* Add torch_utils
* Use torch from torch_utils
* Add torch to banned modules in CI
* Better import error handling
* Fix flake8 errors
* Address comments
* Move networks to GPU if enabled
* Switch to torch_utils
* More flake8 problems
* Move reward providers to GPU/CPU
* Remove anothere set default tensor
* Fix banned import in test
* Moved components to the tf folder and moved the TrainerFactory to the `trainer` folder
* Addressing comments
* Editing the migrating doc
* fixing test
* Torch setup.py
* Set torch to default
* Make torch default in setup.py
* Remove indents
* Remove other instances of TF being used
* Add tensorboard to setup.py
* Adding correst setup commands for verifying torch is installed (#4524)
* Adding correst setup commands for verifying torch is installed
* Editing the test_requirments to add tf and remove torch
* Develop torchdefault raise outside setup (#4530)
* Torch not imported error to raise at first usage
* Torch not imported error to raise at first usage
* [refactor] Use PyTorch TensorBoard utils (#4518)
* Convert stats writer to use PyTorch TB support
* Use common function to print params
* Update test
* Bump tensorboard to 1.15 to fix the tests
* putting tensorboard 1.15.0 as min version requirement
Co-authored-by: vincentpierre <vincentpierre@unity3d.com>
* [Docs] Initial documentation changes for making...