ml-agents

作者	SHA1	备注	提交日期
Arthur Juliani	982fab41	Initial commit	7 年前
vincentpierre	cde3c8f7	formating and added documentation	7 年前
Arthur Juliani	71591043	PPO additions and warnings * Add linear decay to learning rate for PPO * Add warning/exception for unsupported brain configurations w/ PPO	7 年前
GitHub	aee5d336	Fix discrete state (#33 ) * made BrainParameters a class to set default values Modified the error message if the state is discrete * Add discrete state support to PPO and provide discrete state example environment * Add flexibility to continuous control as well * Finish PPO flexible model generation implementation * Fix formatting * Support color observations * Add best practices document * bug fix for non square observations * Update Readme.md * Remove scipy dependency * Add installation doc	7 年前
Arthur Juliani	adac2683	Fix for multi-agent with observations	7 年前
Arthur Juliani	06d9bbec	Log lesson in TensorBoard	7 年前
vincentpierre	22db3d64	added the modified files from dev-cooperative-env	7 年前
Arthur Juliani	51f23cd2	0.2 Update * added broadcast to the player and heuristic brain. Allows the python API to record actions taken along with the states and rewards * removed the broadcast checkbox Added a Handshake method for the communicator The academy will try to handshake regardless of the brains present Player and Heuristic brains will send their information through the communicator but will not receive commands * bug fix : The environment only requests actions from external brains when unique * added warning in case no brins are set to external * fix on the instanciation of coreBrains, fix on the conversion of actions to arrays in the BrainInfo received from step * default discrete action is now 0 bug fix for discrete broadcast action (the action size should be one in Agents.cs) modified Tennis so that the default action is no action modified the TemplateDecsion.cs to ensure non null values are sent from Decide() and MakeMemory() * minor fixes * need to convert the s...	7 年前
Arthur Juliani	b56259f6	Fix cumulative reward (Unity) and Nan reward (python) bugs	7 年前
Arthur Juliani	75ea16ff	Add comments and alphabetize flags	7 年前
Arthur Juliani	adedd491	Initial support for multiple observations (#256 ) * Initial support for multiple observations * Fix PPO for continuous control	7 年前
Arthur Juliani	54652c69	dev-logParam (#135 ) * added the method write text to trainer so it is easy to write log the hyperparameters as a dictionary. Note: needs tensorflow version r1.2 or above * added message if impossible to write text summary in Tensorboard	7 年前
GitHub	faa53e35	Fix observations on PPO trainer (#340 ) * Fix observations on PPO trainer * tested and fixed the fix	7 年前
Arthur Juliani	c21a391d	Various bug fixed and changes * Adjust demo curricula * Fix training buffer reset bug * Make wall height a float * Add pertained models for Area env	7 年前
Arthur Juliani	9d26767d	Instantiate training buffer with trainer	7 年前

15 次代码提交 (94025cbf-09b8-47db-917a-9210a7ecca16)