- Fix issue with BC Trainer `increment_steps`.
- Fix issue with Demonstration Recorder and visual observations (memory leak fix was deleting vis obs too early).
- Make Samplers sample from the same random seed every time, so generalization runs are repeatable.
- Fix crash when using GAIL, Curiosity, and visual observations together.
* Update issue templates
Added three issue template types:
* bug reports
* feature requests
* discussion and general questions
The focus is on getting sufficient information in bug requests to help us reproduce the issue and communicating that we will be unable to generally support custom environments and participate in general discussions.
* Documentation tweaks and updates (#1479)
* Add blurb about using the --load flag in the intro guide, and typo fix.
* Add section in tutorial to create multiple area learning environment.
* Add mention of Done() method in agent design
* fixed the windows ctrl-c bug
* fixed typo
* removed some uncessary printing
* nothing
* make the import of the win api conditional
* removved the duplicate code
* added the ability to use python debugger on ml-agents
* added newline at the end, changed the import to be complete path
* changed the info.log into policy.export_model, changed the sys.platform to use startswith
* fixed a bug
* remove the printing of the path
* tweaked the info message to notify the user about the expected error message
* removed some logging according to comments
* removed the sys import
* Revert "Documentation tweaks and updates (#1479)"
This reverts commit 84ef07a4525fa8a89f4...
As of v0.6, the WallJump example has new brain names while PushBlock
doesn't support curriculum learning. This change renames the WallJump
curriculum files and removes the PushBlock files.
* Enable buffer padding to be set other than 0
Allows buffer padding in AgentBufferField to be set to a custom value. In particular, 0-padding for `action_masks` causes a divide-by-zero error, and should be padded with 1’s instead.
This is done as a parameter passed to the `append` method, so that the pad value can be set right after the instantiation of an AgentBufferField.
The calculation of observation vectors is faulty. The old calculation does not reflect distances to the edges and it does not only yield results between -1 and 1. Since distance calculation would have been difficult in one line, I just replaced it by the relative position of the ball (only using two vectors instead of four). I've conducted 500K-step reinforcing trainings before and after the change and got enormously improved results. Contact me for screenshots of the tensorboard or just use the debugger and do the math.