On Windows the interrupt for subprocesses works in a different
way from OSX/Linux. The result is that child subprocesses and
their pipes may close while the parent process is still running
during a keyboard (ctrl+C) interrupt.
To handle this, this change adds handling for EOFError and
BrokenPipeError exceptions when interacting with subprocess
environments. Additional management is also added to be sure
when using parallel runs using the "num-runs" option that
the threads for each run are joined and KeyboardInterrupts are
handled.
These changes made the "_win_handler" we used to specially
manage interrupts on Windows unnecessary, so they have been
removed.
When using the SubprocessUnityEnvironment, parallel writes are
made to UnitySDK.log. This causes file access violation issues
in Windows/C#. This change modifies the access and sharing mode
for our writes to UnitySDK.log to fix the issue.
SubprocessUnityEnvironment sends an environment factory function to
each worker which it can use to create a UnityEnvironment to interact
with. We use Python's standard multiprocessing library, which pickles
all data sent to the subprocess. The built-in pickle library doesn't
pickle function objects on Windows machines (tested with Python 3.6 on
Windows 10 Pro).
This PR adds cloudpickle as a dependency in order to serialize the
environment factory. Other implementations of subprocess environments
do the same:
https://github.com/openai/baselines/blob/master/baselines/common/vec_env/subproc_vec_env.py
We need to document the meaning of the two new flags added for
multi-environment training. We may also want to add more specific
instructions for people wanting to speed up training in the future.
- Ticked API for pypi for mlagents
- Ticked API for pypi for mlagents_envs
- Ticked Communication number for API
- Ticked API for unity-gym
* Ticked the API for the pytest
This commit adds support for running Unity environments in parallel.
An abstract base class was created for UnityEnvironment which a new
SubprocessUnityEnvironment inherits from.
SubprocessUnityEnvironment communicates through a pipe in order to
send commands which will be run in parallel to its workers.
A few significant changes needed to be made as a side-effect:
* UnityEnvironments are created via a factory method (a closure)
rather than being directly created by the main process.
* In mlagents-learn "worker-id" has been replaced by "base-port"
and "num-envs", and worker_ids are automatically assigned across runs.
* BrainInfo objects now convert all fields to numpy arrays or lists to
avoid serialization issues.
* Added RenderTexture support for visual observations
* Cleaned up new ObservationToTexture function
* Added check for to width/height of RenderTexture
* Added check to hide HelpBox unless both cameras and RenderTextures are used
* Added documentation for Visual Observations using RenderTextures
* Added GridWorldRenderTexture Example scene
* Adjusted image size of doc images
* Added GridWorld example reference
* Fixed missing reference in the GridWorldRenderTexture scene and resaved the agent prefab
* Fix prefab instantiation and render timing in GridWorldRenderTexture
* Added screenshot and reworded documentation
* Unchecked control box
* Rename renderTexture
* Make RenderTexture scene default for GridWorld
Co-authored-by: Mads Johansen <pyjamads@gmail.com>
* fixed the test break on pytest > 4.0, added the pytest cov
* added the pytest-cov package
* added the logic to upload coverage.yml report to codacy
* remove the warning message in during the pytest
* added the codacy badge to show what it looks like
* added a space
* removed the space
* removed the duplicate pytest
* removed the extra spaces
* added the test coverage badge
* point the badge to the test branch
* changed
* moved the python test coverage to circleci
* removed the badge
* added the badge
* fixed the link
* Added the gym_unity test to the circleci
* Fixed the gym_unity installation
* Changed the test-reports from the ml-agents subfolder to the root folder, so that it covers gym_unity’s pytest also
* Garbage collection optimisations:
- Changed a few IEnumerable instances to IReadOnlyList. This avoids some unnecessary GC allocs that cast the Lists to IEnumerables.
- Moved cdf allocation outside of the loop to avoid unnecessary GC allocation.
- Changed GeneratorImpl to use plain float and int arrays instead of Array during generation. This avoids SetValue performing boxing on the arrays, which eliminates an awful lot of GC allocs.
* Convert InferenceBrain to use IReadOnlyList to avoid garbage creation.
* Fix typos
* Use abstract class for rayperception
* Created RayPerception2D. (#1721)
* Incorporate RayPerception2D
* Fix typo
* Make abstract class
* Add tests
* Fix for Brains not reinitialising when the scene is reloaded.
This was a bug caused by the conversion of Brains over to ScriptableObjects. ScriptableObjects persist in memory between scene changes, which means that after a scene change the Brains would still be initialised and the agentInfos list would contain invalid references to the Agents from the previous scene.
The fix is to have the Academy notify the Brains when it is destroyed. This allows the Brains to clean themselves up and transition back to an uninitialised state. After the new scene is loaded, the Brain's LazyInitialise will reconnect the Brain to the new Academy as expected.
* Fix for Brains not reinitialising when the scene is reloaded.
This was a bug caused by the conversion of Brains over to ScriptableObjects. ScriptableObjects persist in memory between scene changes, which means that after a scene change the Brains would still be...
* Move 'take_action' into Policy class
This refactor is part of Actor-Trainer separation. Since policies
will be distributed across actors in separate processes which share
a single trainer, taking an action should be the responsibility of
the policy.
This change makes a few smaller changes:
* Combines `take_action` logic between trainers, making it more
generic
* Adds an `ActionInfo` data class to be more explicit about the
data returned by the policy, only used by TrainerController and
policy for now.
* Moves trainer stats logic out of `take_action` and into
`add_experiences`
* Renames 'take_action' to 'get_action'
Removing this function breaks some tests, and the only way around
this at this time is a bigger refactor or hacky fixes to tests.
For now, I'd suggest we just revert this small part of a change
and keep a refactor in mind for the future.