* Move 'take_action' into Policy class
This refactor is part of Actor-Trainer separation. Since policies
will be distributed across actors in separate processes which share
a single trainer, taking an action should be the responsibility of
the policy.
This change makes a few smaller changes:
* Combines `take_action` logic between trainers, making it more
generic
* Adds an `ActionInfo` data class to be more explicit about the
data returned by the policy, only used by TrainerController and
policy for now.
* Moves trainer stats logic out of `take_action` and into
`add_experiences`
* Renames 'take_action' to 'get_action'
Previously in v0.8 we added parallel environments via the
SubprocessUnityEnvironment, which exposed the same abstraction as
UnityEnvironment while actually wrapping many parallel environments
via subprocesses.
Wrapping many environments with the same interface as a single
environment had some downsides, however:
* Ordering needed to be preserved for agents across different envs,
complicating the SubprocessEnvironment logic
* Asynchronous environments with steps taken out of sync with the
trainer aren't viable with the Environment abstraction
This PR introduces a new EnvManager abstraction which exposes a
reduced subset of the UnityEnvironment abstraction and a
SubprocessEnvManager implementation which replaces the
SubprocessUnityEnvironment.
We have been ignoring unused imports and star imports via flake8. These are
both bad practice and grow over time without automated checking. This
commit attempts to fix all existing import errors and add back the corresponding
flake8 checks.
* Initial commit removing memories from C# and deprecating memory fields in proto
* initial changes to Python
* Adding functionalities
* Fixes
* adding the memories to the dictionary
* Fixing bugs
* tweeks
* Resolving bugs
* Recreating the proto
* Addressing comments
* Passing by reference does not work. Do not merge
* Fixing huge bug in Inference
* Applying patches
* fixing tests
* Addressing comments
* Renaming variable to reflect type
* test
This PR makes it so that the env_manager only sends one current BrainInfo and the previous actions (if any) to the AgentManager. The list of agents was added to the ActionInfo and used appropriately.
This PR moves the AgentManagers from the TrainerController into the env_manager. This way, the TrainerController only needs to create the components (Trainers, AgentManagers) and call advance() on the EnvManager and the Trainers.
* [skip ci] WIP : Modify the base_env.py file
* [skip ci] typo
* [skip ci] renamed some methods
* [skip ci] Incorporated changes from our meeting
* [skip ci] everything is broken
* [skip ci] everything is broken
* [skip ci] formatting
* Fixing the gym tests
* Fixing bug, C# has an error that needs fixing
* Fixing the test
* relaxing the threshold of 0.99 to 0.9
* fixing the C# side
* formating
* Fixed the llapi integratio test
* [Increasing steps for testing]
* Fixing the python tests
* Need __contains__ after all
* changing the max_steps in the tests
* addressing comments
* Making env_manager logic clearer as proposed in the comments
* Remove duplicated logic and added back in episode length (#3728)
* removing mentions of multi-agent in gym and changed the docstring in base_env.py
* Edited the Documentation for the changes to the LLAPI (#3733)
* Edite...
* [bug-fix] Fix regression in --initialize-from feature (#4086)
* Fixed text in GettingStarted page specifying the logdir for tensorboard. Before it was in a directory summaries which no longer existed. Results are now saved to the results dir. (#4085)
* [refactor] Remove nonfunctional `output_path` option from TrainerSettings (#4087)
* Reverting bug introduced in #4071 (#4101)
Co-authored-by: Scott <Scott.m.jordan91@gmail.com>
Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>