ml-agents

作者	SHA1	备注	提交日期
GitHub	fbf92810	Refactor Trainers to use Policy (#1098 )	6 年前
GitHub	10d2a19d	Release v0.5 (Develop) (#1203 )	6 年前
GitHub	ab5c49e8	Release v0.5 delete unityagents (#1151 ) * fixed the loggers * Modified the documentation	6 年前
GitHub	2d4b4209	Use single scope declaration for models (#1160 )	6 年前
GitHub	d2c320dd	Remove graph scope (#1205 ) * initial commit : Only works with PPO balance ball * Fix for recurrent * [Fix indentation error] * Fixed BC * Remove Dead code * Addressing comment : Removing dead code * Fixing the Pytest * edited comments * Removing GraphScope from the InternalBrain (#1227) * Documentation changes for removing graph scope (#1226) * Documentation changes * removed the keep checkpoint printing	6 年前
GitHub	6c354d16	New Learning Brain (#1303 ) * Initial Commit * attempt at refactor * Put all static methods into the CoreInternalBrain * improvements * more testing * modifications * renamed epsilon * misc * Now supports discrete actions * added discrete support and RNN and visual. Left to do is refactor and save variables into models * code cleaning * made a tensor generator and applier * fix on the models.py file * Moved the Checks to a different Class * Added some unit tests * BugFix * Need to generate the output tensors as well as inputs before executing the graph * Made NodeNames static and created a new namespace * Added comments to the TensorAppliers * Started adding comments on the TensorGenerators code * Added comments for the Tensor Generator * Moving the helper classes into a separate folder * Added initial comments to the TensorChecks * Renamed NodeNames -> TensorNames * Removing warnings in tests * Now using Aut...	6 年前
vincentpierre	47de43f6	reverted .tf to .bytes into the policy.py script	6 年前
GitHub	cc083fd8	fixed the windows ctrl-c bug (#1558 ) * Documentation tweaks and updates (#1479) * Add blurb about using the --load flag in the intro guide, and typo fix. * Add section in tutorial to create multiple area learning environment. * Add mention of Done() method in agent design * fixed the windows ctrl-c bug * fixed typo * removed some uncessary printing * nothing * make the import of the win api conditional * removved the duplicate code * added the ability to use python debugger on ml-agents * added newline at the end, changed the import to be complete path * changed the info.log into policy.export_model, changed the sys.platform to use startswith * fixed a bug * remove the printing of the path * tweaked the info message to notify the user about the expected error message * removed some logging according to comments * removed the sys import * Revert "Documentation tweaks and updates (#1479)" This reverts commit 84ef07a4525fa8a89f4...	6 年前
Vincent-Pierre BERGES	4a6ae4e0	Barracuda integration into ML-Agents (#1557 ) * Switched default Mac GFX API to Metal * Added Barracuda pre-0.1.5 * Added basic integration with Barracuda Inference Engine * Use predefined outputs the same way as for TF engine * Fixed discrete action + LSTM support * Switch Unity Mac Editor to Metal GFX API * Fixed null model handling * All examples converted to support Barracuda * Added model conversion from Tensorflow to Barracuda copied the barracuda.py file to ml-agents/mlagents/trainers copied the tensorflow_to_barracuda.py file to ml-agents/mlagents/trainers modified the tensorflow_to_barracuda.py file so it could be called from mlagents modified ml-agents/mlagents/trainers/policy.py to convert the tf models to barracuda compatible .bytes file * Added missing iOS BLAS plugin * Added forgotten prefab changes * Removed GLCore GFX backend for Mac, because it doesn't support Compute shaders * Exposed GPU support for LearningBrain inference ...	6 年前
GitHub	c258b1c3	Move 'take_action' into Policy class (#1669 ) * Move 'take_action' into Policy class This refactor is part of Actor-Trainer separation. Since policies will be distributed across actors in separate processes which share a single trainer, taking an action should be the responsibility of the policy. This change makes a few smaller changes: * Combines `take_action` logic between trainers, making it more generic * Adds an `ActionInfo` data class to be more explicit about the data returned by the policy, only used by TrainerController and policy for now. * Moves trainer stats logic out of `take_action` and into `add_experiences` * Renames 'take_action' to 'get_action'	6 年前
GitHub	cfb8f208	Release v0.7 minor fixes (#1759 ) * Fix typo * Updated some of the scenes	6 年前
GitHub	20ff1436	Merge pull request #1765 from Unity-Technologies/release-v0.7 Release v0.7 into develop	6 年前
eshvk	ef8009d9	Python code reformat via [`black`](https://github.com/ambv/black ). Features: - Reformat code via black. - Adding circleci configurations. - Add contribution guidelines. Steps to reproduce: - `pip install black` - `black <source code directory>`	6 年前
GitHub	2671e1a0	Enable mypy in precommit checks (#2177 ) * WIP precommit on top level * update CI * circleci fixes * intentionally fail black * use --show-diff-on-failure in CI * fix command order * rebreak a file * apply black * WIP enable mypy * run mypy on each package * fix trainer_metrics mypy errors * more mypy errors * more mypy * Fix some partially typed functions * types for take_action_outputs * fix formatting * cleanup * generate stubs for proto objects * fix ml-agents-env mypy errors * disallow-incomplete-defs for gym-unity * Add CI notes to CONTRIBUTING.md	5 年前
GitHub	4ac79742	Refactor reward signals into separate class (#2144 ) * Create new class (RewardSignal) that represents a reward signal. * Add value heads for each reward signal in the PPO model. * Make summaries agnostic to the type of reward signals, and log weighted rewards per reward signal. * Move extrinsic and curiosity rewards into this new structure. * Allow defining multiple reward signals in YAML file. Add documentation for this new structure.	5 年前
GitHub	b05c9ac1	Add environment manager for parallel environments (#2209 ) Previously in v0.8 we added parallel environments via the SubprocessUnityEnvironment, which exposed the same abstraction as UnityEnvironment while actually wrapping many parallel environments via subprocesses. Wrapping many environments with the same interface as a single environment had some downsides, however: * Ordering needed to be preserved for agents across different envs, complicating the SubprocessEnvironment logic * Asynchronous environments with steps taken out of sync with the trainer aren't viable with the Environment abstraction This PR introduces a new EnvManager abstraction which exposes a reduced subset of the UnityEnvironment abstraction and a SubprocessEnvManager implementation which replaces the SubprocessUnityEnvironment.	5 年前
GitHub	a9fe719c	Add Multi-GPU implementation for PPO (#2288 ) Add MultiGpuPPOPolicy class and command line options to run multi-GPU training	5 年前
GitHub	832e4a47	Normalize observations when adding experiences (#2556 ) * Normalize observations when adding experiences This change moves normalization of vector observations into the trainer's "add_experiences" interface. Prior to this change, normalization occurred at inference time. This was somewhat confusing since usually executing a forward pass shouldn't have side-effects which would change the training step. Also, in a asynchronous or distributed setting where we copy the neural network weights from a trainer to a remote actor / inference worker we'd end up with training issues because of the weights being different on the trainer than the workers.	5 年前
GitHub	67d754c5	Fix flake8 import warnings (#2584 ) We have been ignoring unused imports and star imports via flake8. These are both bad practice and grow over time without automated checking. This commit attempts to fix all existing import errors and add back the corresponding flake8 checks.	5 年前
GitHub	36ed3c16	Fix issue exporting graph with multi-GPU (#2573 ) Our multi-GPU training had a regression such that freezing the graph was broken. This change fixes that issue by making a few changes: * Removes the top level "tower" variable scope added by multi-GPU so that the output nodes have correct names * Removes the use of "freeze_graph" and replaces it with our own similar functionality. * Adds the "auto reuse" to network layers which require them	5 年前
Jonathan Harper	3fc14963	EXPERIMENTAL horovod support	5 年前
Chris Elion	43e23941	rough pass at tf2 support, needs cleanup	5 年前
Chris Elion	806c77e4	centralize tensorflow imports	5 年前
GitHub	0fe5adc2	Develop remove memories (#2795 ) * Initial commit removing memories from C# and deprecating memory fields in proto * initial changes to Python * Adding functionalities * Fixes * adding the memories to the dictionary * Fixing bugs * tweeks * Resolving bugs * Recreating the proto * Addressing comments * Passing by reference does not work. Do not merge * Fixing huge bug in Inference * Applying patches * fixing tests * Addressing comments * Renaming variable to reflect type * test	5 年前
Chris Elion	691d21e6	Merge remote-tracking branch 'origin/develop' into try-tf2-support	5 年前
GitHub	c6c01a03	Enable pylint and fix a few things (#2767 ) * enable pylint, disable some messages and fix a few * SAC memories in init	5 年前
GitHub	ccb7eab4	Remove {text,custom} {action,observations} (#2839 ) * delete text actions and obs * delete custom actions and obs * regenerate protos * cleanup C# * format * fix tests * fix base env signature * doc cleanup	5 年前
Chris Elion	fca51de8	Merge remote-tracking branch 'origin/develop' into try-tf2-support	5 年前
Chris Elion	73a346cb	cleanup	5 年前
GitHub	69d1a033	Develop remove past action communication (#2913 ) * Modifying the .proto files * attempt 1 at refactoring Python * works for ppo hallway * changing the documentation * now works with both sac and ppo both training and inference * Ned to fix the tests * TODOs : - Fix the demonstration recorder - Fix the demonstration loader - verify the intrinsic reward signals work - Fix the tests on Python - Fix the C# tests * Regenerating the protos * fix proto typo * protos and modifying the C# demo recorder * modified the demo loader * Demos are loading * IMPORTANT : THESE ARE THE FILES USED FOR CONVERSION FROM OLD TO NEW FORMAT * Modified all the demo files * Fixing all the tests * fixing ci * addressing comments * removing reference to memories in the ll-api	5 年前
GitHub	652488d9	check for numpy float64 (#2948 )	5 年前
GitHub	681093cf	cherry pick PR#3032 (#3066 )	5 年前
GitHub	ef2514ba	Develop cold fix recurrent (#3032 ) * Fixing the value estimate with recurrent * fix typing * Fix type check	5 年前
GitHub	36048cb6	Moving Env Manager to Trainers (#3062 ) The Env Manager is only used by the trainer codebase. The entry point to interact with an environment is UnityEnvironment. * Moving Env Manager to Trainers * fix pylint madness	5 年前
GitHub	90db165f	Add --namespace-packages to mypy for mlagents (#3075 )	5 年前
Chris Elion	fdc810ff	move (first pass)	5 年前
Ervin Teng	2b811fc8	Properly report value estimates and episode length	5 年前
GitHub	2fd305e7	Move add_experiences out of trainer, add Trajectories (#3067 )	5 年前
Ervin Teng	c330f6f6	Merge branch 'master' into develop-agentprocessor	5 年前
Ervin Teng	fdf9aea7	Make conversion methods part of NamedTuples	5 年前
Ervin Teng	1bd791e5	Merge branch 'master' into develop-agentprocessor	5 年前
GitHub	d798b1cb	Prevent tf.Session() from eating up all the GPU memory (#3219 ) * Use soft placement and allow_growth for Session * Move config generation to tf utils * Re-add self.graph	5 年前
GitHub	4c241a80	Only send previous action and current BrainInfo (#3187 ) This PR makes it so that the env_manager only sends one current BrainInfo and the previous actions (if any) to the AgentManager. The list of agents was added to the ActionInfo and used appropriately.	5 年前
GitHub	f058b18c	Replace BrainInfos with BatchedStepResult (#3207 )	5 年前
Ervin Teng	76ad64d7	Some more bugfixes	5 年前
Ervin Teng	29f3330f	Merge master into hotfix-0.13.1	5 年前
Ervin Teng	2b63415e	Clean up policy files	5 年前
GitHub	ca96b293	Move advance() logic for environment manager out of trainer_controller (#3234 ) This PR moves the AgentManagers from the TrainerController into the env_manager. This way, the TrainerController only needs to create the components (Trainers, AgentManagers) and call advance() on the EnvManager and the Trainers.	5 年前
Ervin Teng	9ad99eb6	Combined model and policy for PPO	5 年前
Ervin Teng	164732a9	Move optimizer creation to Trainer, fix some of the reward signals	5 年前
Ervin Teng	0ef40c08	SAC CC working	5 年前
GitHub	14193ada	Self-play for symmetric games (#3194 )	5 年前
Ervin Teng	db249ceb	Merge branch 'master' into develop-splitpolicyoptimizer	5 年前
Ervin Teng	edeceefd	Zeroed version of LSTM working for PPO	5 年前
Ervin Teng	4de71b84	0 out value estimates as well	5 年前
Ervin Teng	4871f49c	Fix comments for PPO	5 年前
Ervin Teng	cadf6603	Fix SAC CC and some reward signal tests	5 年前
GitHub	dd86e879	Separate out optimizer creation and policy graph creation (#3355 )	5 年前
Ervin Teng	cdd57468	Re-fix scoping and add method to get all variables	5 年前
Ervin Teng	48b39b80	Fix ghost trainer and all tests	5 年前
Ervin Teng	c350c6d8	Added enforcement of m_size to be divisible by 2	5 年前
Ervin Teng	441e6a0c	Add typing to optimizer, rename self.tf_optimizer	5 年前
Ervin Teng	7004604d	Used NamedTuple for create normalization tensors	5 年前
Ervin Teng	8abd4129	Clean up nn_policy	5 年前
Ervin Teng	7c0fa1c4	Remove action_holder placeholder	5 年前
GitHub	587dd165	Support for ONNX export (#3101 )	5 年前
GitHub	3641293f	Change checkpoint suffix to "ckpt" (#3470 ) Tensorflow doesn't prescribe any particular file suffix for checkpoint files, but they are commonly referred to as "ckpt" as a shorthand for "checkpoint". However ours is somewhat confusingly "cptk". This change simply changes our checkpoint suffix to "ckpt".	5 年前
Ervin Teng	bcc25d59	Merge branch 'master' into develop-splitpolicyoptimizer	5 年前
Ervin Teng	1cfc461a	Remove and rename tf_optimizer	5 年前
Ervin Teng	63463bd1	Make TF graph seed deterministic	5 年前
GitHub	c145e75b	Split Policy and Optimizer, common Policy for PPO and SAC (#3345 )	5 年前
Ervin Teng	1156b9b3	Merge branch 'develop-splitpolicyoptimizer' into develop-removeactionholder	5 年前
Ervin Teng	53c25fb1	Move one-hot out of policy and remove selected_actions	5 年前
GitHub	7d954797	[change] Separate action outputs into OutputDistributions object (#3514 )	5 年前
Anupam Bhatnagar	f4dbedcf	removed extraneous logging imports and loggers	5 年前
Anupam Bhatnagar	e8e0078e	first commit	5 年前
Anupam Bhatnagar	07b15ae7	[skip-ci] small refactors	5 年前
GitHub	873ba7fd	[bug-fix] Fix stats reporting for reward signals in SAC (#3606 )	5 年前
GitHub	ec278616	Hotfixes for Release 0.15.1 (#3698 ) * [bug-fix] Increase height of wall in CrawlerStatic (#3650) * [bug-fix] Improve performance for PPO with continuous actions (#3662) * Corrected a typo in a name of a function (#3670) OnEpsiodeBegin was corrected to OnEpisodeBegin in Migrating.md document * Add Academy.AutomaticSteppingEnabled to migration (#3666) * Fix editor port in Dockerfile (#3674) * Hotfix memory leak on Python (#3664) * Hotfix memory leak on Python * Fixing * Fixing a bug in the heuristic policy. A decision should not be requested when the agent is done * [bug-fix] Make Python able to deal with 0-step episodes (#3671) * adding some comments Co-authored-by: Ervin T <ervin@unity3d.com> * Remove vis_encode_type from list of required (#3677) * Update changelog (#3678) * Shorten timeout duration for environment close (#3679) The timeout duration for closing an environment was set to the same duration as the timeout when waiting ...	5 年前
GitHub	de3fc4e8	Hotfix memory leak on Python (#3664 ) * Hotfix memory leak on Python * Fixing * Fixing a bug in the heuristic policy. A decision should not be requested when the agent is done * [bug-fix] Make Python able to deal with 0-step episodes (#3671) * adding some comments Co-authored-by: Ervin T <ervin@unity3d.com>	5 年前
Andrew Cohen	93d344ff	simple rl asymm ghost tests	5 年前
GitHub	bc1fdf07	[refactor] CLI changes (#3705 )	5 年前
Andrew Cohen	59b88be6	Merge branch 'master' into self-play-mutex	5 年前
GitHub	9cbc3fa2	Asymmetric self-play (#3653 )	5 年前
Anupam Bhatnagar	50e52d9c	Merge branch 'master' into distributed-training	5 年前
GitHub	d7ca6b8d	[feature] Add --initialize-from option (#3710 )	5 年前
Anupam Bhatnagar	001fce2a	first commit	5 年前
GitHub	43f23ee3	WIP : Changes to the LL-API - Refactor of “done” logic (#3681 ) * [skip ci] WIP : Modify the base_env.py file * [skip ci] typo * [skip ci] renamed some methods * [skip ci] Incorporated changes from our meeting * [skip ci] everything is broken * [skip ci] everything is broken * [skip ci] formatting * Fixing the gym tests * Fixing bug, C# has an error that needs fixing * Fixing the test * relaxing the threshold of 0.99 to 0.9 * fixing the C# side * formating * Fixed the llapi integratio test * [Increasing steps for testing] * Fixing the python tests * Need __contains__ after all * changing the max_steps in the tests * addressing comments * Making env_manager logic clearer as proposed in the comments * Remove duplicated logic and added back in episode length (#3728) * removing mentions of multi-agent in gym and changed the docstring in base_env.py * Edited the Documentation for the changes to the LLAPI (#3733) * Edite...	5 年前
Anupam Bhatnagar	9341f7a2	[skip-ci] small refactors	5 年前
Arthur Juliani	7c3bd376	Refactoring policy and optimizer	5 年前
Arthur Juliani	b997f214	Share more code between tf and torch policies	5 年前
GitHub	232519e4	[refactor] Move output artifacts to a single results/ folder (#3829 )	5 年前
Arthur Juliani	1736559f	Combine actor and critic classes. Initial export.	5 年前
Arthur Juliani	ca887743	Support tf and pytorch alongside one another	5 年前
GitHub	d2bc86c8	Release 2 cherry pick (#3971 ) * [bug-fix] Fix issue with initialize not resetting step count (#3962) * Develop better error message for #3953 (#3963) * Making the error for wrong number of agents raise consistently * Better error message for inputs of wrong dimensions * Fix #3932, stop the editor from going into a loop when a prefab is selected. (#3949) * Minor doc updates to release * add unit tests and fix exceptions (#3930) Co-authored-by: Ervin T <ervin@unity3d.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com> Co-authored-by: Chris Goy <christopherg@unity3d.com>	5 年前
Arthur Juliani	89ad3020	Merge remote-tracking branch 'origin/master' into develop-add-fire # Conflicts: # ml-agents/mlagents/trainers/policy/tf_policy.py	5 年前
Christopher Goy	ba80b292	format files with pre-commit.	4 年前
GitHub	abbc6424	[bug-fix] Fix issue with initialize not resetting step count (#3962 )	5 年前
Arthur Juliani	28e095e0	Merge remote-tracking branch 'origin/master' into develop-add-fire	5 年前
GitHub	e92b4f88	[refactor] Structure configuration files into classes (#3936 )	5 年前
GitHub	335cff3e	[versioning] Save ML-Agents version in checkpoints and check on load (#4035 )	5 年前
GitHub	a1c63c4b	Release 3 Cherry-pick bug-fixes and doc changes from master (#4102 ) * [bug-fix] Fix regression in --initialize-from feature (#4086) * Fixed text in GettingStarted page specifying the logdir for tensorboard. Before it was in a directory summaries which no longer existed. Results are now saved to the results dir. (#4085) * [refactor] Remove nonfunctional `output_path` option from TrainerSettings (#4087) * Reverting bug introduced in #4071 (#4101) Co-authored-by: Scott <Scott.m.jordan91@gmail.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>	4 年前
Anupam Bhatnagar	4afd8f92	first commit	4 年前
Arthur Juliani	9724c9ac	Merge master	4 年前
Ervin Teng	510583d2	Move memory validation to settings	4 年前
GitHub	a28e2767	Update add-fire to latest master, including Policy refactor (#4263 ) * Update Dockerfile * Separate send environment data from reset (#4128) * Fixed a typo on ML-Agents-Overview.md (#4130) Fixed redundant "to" word from the sentence since it is probably a typo in document. * Updated the badge’s link to point to the newest doc version * Replaced all of the doc to release_3_doc * Fix 3DBall and 3DBallHard SAC regressions (#4132) * Move memory validation to settings * Update docs * Add settings test * Update to release_3 in installation.md (#4144) * rename to SideChannelManager +backcompat (#4137) * Remove comment about logo with --help (#4148) * [bugfix] Make FoodCollector heuristic playable (#4147) * Make FoodCollector heuristic playable * Update changelog * script to check for old release links and references (#4153) * Remove package validation suite from Project (#4146) * RayPerceptionSensor: handle empty and invalid tags (#4155...	4 年前
Ruo-Ping Dong	6feec58a	add Saver class (only TF working)	4 年前
Ruo-Ping Dong	71fe4df6	fix formatting and test	4 年前
Ruo-Ping Dong	b4713baa	small improvements	4 年前
GitHub	3bcb029b	[refactor] Remove BrainParameters from Python code (#4138 )	4 年前
yanchaosun	5a778ca3	fix normalization	4 年前
yanchaosun	a212fef9	new bisim implementation	4 年前
GitHub	84440f05	Convert checkpoints to .NN (#4127 ) This change adds an export to .nn for each checkpoint generated by RLTrainer and adds a NNCheckpointManager to track the generated checkpoints and final model in training_status.json. Co-authored-by: Jonathan Harper <jharper+moar@unity3d.com>	4 年前
GitHub	1f5eb9da	add pyupgrade to pre-commit and run (#4239 )	4 年前
GitHub	129f9ddc	[MLA-427] make pyupgrade convert f-strings too (#4244 ) * make pyupgrade convert f-strings too	4 年前
GitHub	1b098c9a	Refactor TFPolicy and Policy (#4254 ) * Refactor TFPolicy and Policy	4 年前
GitHub	380fef57	[refactor] Move TF-specific files to tf/ folder (#4266 )	4 年前
Andrew Cohen	06e4356c	Merge branch 'master' into sensitivity	4 年前
Andrew Cohen	18ff42a6	use mean of first trajectory to initialize the normalizer	4 年前
Andrew Cohen	5878b952	remove blank line	4 年前
Andrew Cohen	ce9bcefe	cleaned up initialization of variance/mean	4 年前
Ruo-Ping Dong	95858e25	update saver interface and add tests	4 年前
Anupam Bhatnagar	87bdf353	[skip ci] save model on worker zero only	4 年前
Anupam Bhatnagar	d3e8f124	removing horovod from tf policy	4 年前
Anupam Bhatnagar	abc1220f	Merge branch 'master' into global-variables	4 年前
Chris Elion	d2133d83	comments and cleanup	4 年前
GitHub	25dc8c3d	Add Saver Class to handle all save/load/checkpoint/export work (#4323 )	4 年前
Andrew Cohen	a65d08c7	ghost trainer tests	4 年前
GitHub	49545ce1	Pytorch ghost trainer (#4370 )	4 年前
Andrew Cohen	71f9c241	fix tf policy for ghosts	4 年前
Anupam Bhatnagar	5e8aa485	renaming file from globals.py to global_values.py	4 年前
Anupam Bhatnagar	71c301bc	minor fixes	4 年前
Anupam Bhatnagar	1f60979f	[skip ci] change self.rank to global_values.get_rank()	4 年前
Anupam Bhatnagar	f4f1a8d9	merge master into trainer-plugin branch	4 年前
Andrew Cohen	fc3027ac	tf tests except gail pass	4 年前
Andrew Cohen	f654df34	fixing tensorflow tests	4 年前
GitHub	cb8e4d25	Add ActionSpec (#4586 ) Co-authored-by: Ervin T <ervin@unity3d.com>	4 年前
Andrew Cohen	9689cf2c	remove _action_ from function names	4 年前
GitHub	b853e5ba	Action buffer (#4612 ) Co-authored-by: Ervin T <ervin@unity3d.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>	4 年前
GitHub	3c96a3a2	Action Model (#4580 ) Co-authored-by: Ervin T <ervin@unity3d.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>	4 年前
GitHub	88d3ec3e	Merge master into hybrid actions staging branch (#4704 )	4 年前
GitHub	87a7ccf8	use int64 steps, check for NaN actions (#4607 ) * use int64 steps * check for NaN actions Co-authored-by: Ruo-Ping Dong <ruoping.dong@unity3d.com>	4 年前
Andrew Cohen	8172b3d6	test_simple_rl/reward providers pass tf/torch	4 年前
GitHub	a0d1c829	Action Docs part2 (#4739 ) * reduce usage of "vector action" and "action space" * more cleanup * undo GettingStarted change for now * batch size description * Apply suggestions from code review Co-authored-by: andrewcoh <54679309+andrewcoh@users.noreply.github.com> Co-authored-by: andrewcoh <54679309+andrewcoh@users.noreply.github.com>	4 年前
Andrew Cohen	cd73cce2	test_trajectory fixed	4 年前
Andrew Cohen	97d94a83	fix test_tf_policy	4 年前
Andrew Cohen	498b1ee6	Merge branch 'develop-action-buffer' into develop-hybrid-actions-singleton	4 年前
Andrew Cohen	35769b53	Merge branch 'develop-action-buffer' into develop-hybrid-actions-singleton	4 年前
GitHub	9d8a7d6f	Update ml-agents/mlagents/trainers/policy/tf_policy.py Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>	4 年前
Andrew Cohen	7ba10239	remove action spec attribute from policy	4 年前
Andrew Cohen	662fd6b1	added docstrings to action flattener	4 年前
GitHub	cc948a41	Policy output actiontuple (#4651 )	4 年前

1 2 3 4

152 次代码提交 (4671cf17-7b01-4722-935e-99b514a6ebbe)