ml-agents

作者	SHA1	备注	提交日期
GitHub	8317a659	Behavioral Cloning & Trainers Reorg (#328 ) * Implement behavioral cloning for cc/dc, fc/rnn, state/observations. * Re-organize folder structure in anticipation of unitytrainers as a package. * Create demo environment BananaImitation to validate behavioral cloning. * Fixes #336	6 年前
GitHub	e11dae1d	Python Testing & Image Inference Improvements (#353 ) * Reorganized python tests into separate folder, and make individiual test files for different (sub) modules. * Add tests for trainer_controller, PPO, and behavioral cloning. More to come soon. * Minor bug fixes discovered while writing tests. * Reworked GirdWorld to reset much faster. * Cleaned ObservationToTex and reworked GetObservationMatrixList to be 3x faster.	6 年前
Arthur Juliani	b8a4f5f1	Add Hallway envronment to validate LSTM models	6 年前
Arthur Juliani	c3644f56	Buffer fix for properly masking gradients	6 年前
GitHub	f134016b	On Demand Decision (#308 ) * On Demand Decision : Use RequestDecision and RequestAction * New Agent Inspector : Use it to set On Demand Decision * New BrainParameters interface * LSTM memory size is now set in python * New C# API * Semantic Changes * Replaced RunMDP * New Bouncer Environment to test On Demand Dscision	6 年前
GitHub	a7c9096f	[Semantics] Modified the placeholder names (#381 )	6 年前
GitHub	848b8a58	Fix PPO regression (#434 ) * Fix PPO regression	6 年前
GitHub	237b41f9	Hotfix 0.3.0c (#618 ) Fixes the following issues: * Missing component reference in BananaRL environment. * Neural Network for multiple visual observations was not properly generated. * Episode time-out value estimate bootstrapping used incorrect observation as input.	6 年前
GitHub	3b866e9f	Use Clipped Gaussian (#649 ) This PR makes the following changes: * Moves clipping of continuous control model into model itself. Output is now always [-1, 1]. * Internal model values are now clipped between [-3, 3] before being rescaled to [-1, 1] for output. * This improves training performance by providing a wider range of values within which the pdf of the gaussian can fall. Output of [-1, 1] is used to be more environment-creator friendly. * Fixes issue where epsilon was erroneously being used to reconstruct old probabilities during PPO update, leading to reduced learning performance. * Introduce ScaleAction() function within python to easily rescale values from [-1, 1] to arbitrary range. * Re-train all CC models using improved algorithm. All performance levels are equal or improved. In the case of Crawler, improvement is drastic. * Update documentation appropriately. * Made miscellaneous minor code style and optimization improvements within environments.	6 年前
GitHub	c17937ef	Curiosity Driven Exploration & Pyramids Environments (#739 ) * Adds implementation of Curiosity-driven Exploration by Self-supervised Prediction (https://arxiv.org/abs/1705.05363) to PPO trainer. * To enable, set use_curiosity flag to true in hyperparameter file. * Includes refactor of unitytrainers model code to accommodate new feature. * Adds new Pyramids environment (w/ documentation). Environment contains sparse reward, and can only be solved using PPO+Curiosity.	6 年前
Arthur Juliani	5d402be9	Minor Optimizations (#836 )	6 年前
GitHub	282d5bd4	Fix Pytests (#843 )	6 年前
GitHub	a720e370	Fix bug and update tests (#850 )	6 年前
GitHub	47fc38ab	Additional Tests & Bug Fixes (#854 ) * Add tests and fix for sparse tensor warning * Rename mock communicator parameter * Test longer sequences * Curiosity tests and bug fixes	6 年前
GitHub	6df07946	Fix for Discrete observations + Curiosity (#866 )	6 年前
Arthur Juliani	5e48766d	Remove discrete observations	6 年前
Arthur Juliani	b46b8708	Rename function	6 年前
Arthur Juliani	12d52cb0	Replace tanh on cc models w/ swish	6 年前
GitHub	e50ac7ae	Merge branch 'develop' into hotfix-0	6 年前
Arthur Juliani	3659bbcd	Develop multi discrete (#1022 ) Replace discrete control with multi-discrete control.	6 年前
Deric Pang	634280a6	Fixed imports, all tests are passing.	6 年前
GitHub	ded0d8c7	Develop action masking (#1080 ) * [Initial Commit] Modified the model.py file and the ppo/trainer.py file to use masked actions * Preliminary modifications to the python side of the code to enable action masking * Preliminary modifications to the C# side of the code to enable action masking * Preliminary modifications to the communication side of the code to enable action masking * Implemented action masking for BC Note : The actions of the teacher are not masked * More error messages for the action masking * fix pytests * Added Documentation * Address comment * Addressed Comments on docs * Addressed second comment on docs * Addressed comments for the python side of the code * Created the action masker and associated unit tests * Addressed comments on the C# side * Addressed the comment regarding action_masking_name * Addressed the comments	6 年前
GitHub	2e489abc	Normalization of the probabilities after masking (#1123 ) * python/unitytrainers/bc/models.py * Updated BC to reflect the changes	6 年前
Deric Pang	cdb41480	Merge remote-tracking branch 'upstream/develop' into develop-flat-code-restructure	6 年前
Deric Pang	d4ca94a1	Merge remote-tracking branch 'upstream/develop' into develop-flat-code-restructure	6 年前
GitHub	fbf92810	Refactor Trainers to use Policy (#1098 )	6 年前
GitHub	10d2a19d	Release v0.5 (Develop) (#1203 )	6 年前
GitHub	2d4b4209	Use single scope declaration for models (#1160 )	6 年前
GitHub	ab6eb8dc	Fix TF Nan bug (#1178 ) * Fix for TF NaNs * New soccer model	6 年前
GitHub	6c354d16	New Learning Brain (#1303 ) * Initial Commit * attempt at refactor * Put all static methods into the CoreInternalBrain * improvements * more testing * modifications * renamed epsilon * misc * Now supports discrete actions * added discrete support and RNN and visual. Left to do is refactor and save variables into models * code cleaning * made a tensor generator and applier * fix on the models.py file * Moved the Checks to a different Class * Added some unit tests * BugFix * Need to generate the output tensors as well as inputs before executing the graph * Made NodeNames static and created a new namespace * Added comments to the TensorAppliers * Started adding comments on the TensorGenerators code * Added comments for the Tensor Generator * Moving the helper classes into a separate folder * Added initial comments to the TensorChecks * Renamed NodeNames -> TensorNames * Removing warnings in tests * Now using Aut...	6 年前
vincentpierre	03a8b7ed	fix discrete curiosity	6 年前
vincentpierre	eb4e23a7	making masked actions impossible instead of improbable	6 年前
GitHub	249e86a4	Ticked API : (#1696 ) * Ticked API : - Ticked API for pypi for mlagents - Ticked API for pypi for unity-gym - Ticked Communication number for API - Ticked Model Loader number for API * Ticked the API for the pytest	5 年前
Ervin T	b30f4c90	Split `mlagents` into two packages (#1812 ) * Reogranize project * Fix all tests * Address comments * Delete init file * Update requirements * Tick version * Add timeout wait parameter (mlagents_envs) (#1699) * Add timeout wait param * Remove unnecessary function * Add new meta files for communicator objects * Fix all tests * update circleci * Reorganize mlagents_envs tests * WIP: test removing circleci cache * Move gym tests * Namespaced packages * Update installation instructions for separate packages * Remove unused package from setup script * Add Readme for ml-agents-envs * Clarify docs and re-comment compiler in make.bat * Add more doc to installation * Add back fix for Hololens * Recompile Protobufs * Change mlagents_envs to mlagents.envs in trainer_controller * Remove extraneous files, fix win bat script * Support Python 3.7 for envs package	5 年前
eshvk	ef8009d9	Python code reformat via [`black`](https://github.com/ambv/black ). Features: - Reformat code via black. - Adding circleci configurations. - Add contribution guidelines. Steps to reproduce: - `pip install black` - `black <source code directory>`	5 年前
GitHub	4ac79742	Refactor reward signals into separate class (#2144 ) * Create new class (RewardSignal) that represents a reward signal. * Add value heads for each reward signal in the PPO model. * Make summaries agnostic to the type of reward signals, and log weighted rewards per reward signal. * Move extrinsic and curiosity rewards into this new structure. * Allow defining multiple reward signals in YAML file. Add documentation for this new structure.	5 年前
GitHub	b05c9ac1	Add environment manager for parallel environments (#2209 ) Previously in v0.8 we added parallel environments via the SubprocessUnityEnvironment, which exposed the same abstraction as UnityEnvironment while actually wrapping many parallel environments via subprocesses. Wrapping many environments with the same interface as a single environment had some downsides, however: * Ordering needed to be preserved for agents across different envs, complicating the SubprocessEnvironment logic * Asynchronous environments with steps taken out of sync with the trainer aren't viable with the Environment abstraction This PR introduces a new EnvManager abstraction which exposes a reduced subset of the UnityEnvironment abstraction and a SubprocessEnvManager implementation which replaces the SubprocessUnityEnvironment.	5 年前
GitHub	d80d5852	add some types to the reward signals (#2215 ) * WIP add some types to the reward signals * fix next_visual_in * cleanup TODO * fix bad merge	5 年前
GitHub	be4292fb	Add different types of visual encoder (nature cnn/resnet) Add resnet and nature cnn in addition to default visual encoder	5 年前
GitHub	6225317d	refactor vis_encoder_type and add to doc refactor vis_encoder_type and add to doc	5 年前
GitHub	a9fe719c	Add Multi-GPU implementation for PPO (#2288 ) Add MultiGpuPPOPolicy class and command line options to run multi-GPU training	5 年前
GitHub	7b69bd14	Refactor Trainer and Model (#2360 ) - Move common functions to trainer.py, model.pyfromppo/trainer.py, ppo/policy.pyandppo/model.py' - Introduce RLTrainer class and move most of add_experiences and some common reward signal code there. PPO and SAC will inherit from this, not so much BC Trainer. - Add methods to Buffer to enable sampling, truncating, and save/loading. - Add scoping to create encoders in model.py	5 年前
GitHub	3683cc1c	Enable learning rate decay to be disabled (#2567 )	5 年前
GitHub	4980b904	Cleanup visual obs setup (#2647 ) * DRY up the setup code * fstrings	5 年前
GitHub	68965c7b	Use a class for camera res, not dict (#2656 )	5 年前
Chris Elion	43e23941	rough pass at tf2 support, needs cleanup	5 年前
Chris Elion	806c77e4	centralize tensorflow imports	5 年前
Chris Elion	8da16bdb	move compat functions	5 年前
GitHub	4da157fe	more pylint fixes (#2842 )	5 年前
GitHub	e6f549dc	[MLA-12] update protobuf for vector observations (#2862 )	5 年前
Chris Elion	fca51de8	Merge remote-tracking branch 'origin/develop' into try-tf2-support	5 年前
Chris Elion	73a346cb	cleanup	5 年前
GitHub	f57b7ac6	Allow usage with tensorflow 2.0.0 (via tf.compat.v1) (#2665 )	5 年前
GitHub	36048cb6	Moving Env Manager to Trainers (#3062 ) The Env Manager is only used by the trainer codebase. The entry point to interact with an environment is UnityEnvironment. * Moving Env Manager to Trainers * fix pylint madness	5 年前
GitHub	42bea858	Improve mypy coverage by adding --namespace-packages (#3049 )	5 年前
GitHub	2fd305e7	Move add_experiences out of trainer, add Trajectories (#3067 )	5 年前
GitHub	3de3c1f1	check min size for visual encoders (#3112 ) * check min size for visual encoders * friendlier exception * fix typo	5 年前
Ervin Teng	69e7eeac	Normalize based on number of elements	5 年前
Ervin Teng	0046ea2d	Add comment	5 年前
Ervin Teng	0040dc7f	New way to update mean and var	5 年前
Ervin Teng	f80b1d12	Use running norm and std	5 年前
GitHub	f058b18c	Replace BrainInfos with BatchedStepResult (#3207 )	5 年前
Ervin Teng	03c750a7	Move some functionality to optimizer	5 年前
Ervin Teng	cd74e51b	More progress	5 年前
Ervin Teng	2373cae8	Move methods into common optimizer	5 年前
Ervin Teng	9ad99eb6	Combined model and policy for PPO	5 年前
Ervin Teng	28f7608f	Clean up value head creation	5 年前
Ervin Teng	6bbcf2d7	Add typing to value head creator	5 年前
Ervin Teng	08cb91de	Remove __init__ for LearningModel static class	4 年前
Ervin Teng	a6e28cf4	Fix for visual obs	4 年前
Ervin Teng	7004604d	Used NamedTuple for create normalization tensors	4 年前
Ervin Teng	a990e5e8	Add docstrings for model	4 年前
GitHub	c145e75b	Split Policy and Optimizer, common Policy for PPO and SAC (#3345 )	4 年前
Ervin Teng	53c25fb1	Move one-hot out of policy and remove selected_actions	4 年前
Anupam Bhatnagar	f4dbedcf	removed extraneous logging imports and loggers	4 年前
GitHub	94de596b	[change] Remove concatenate in discrete action probabilities to improve inference performance (#3598 )	4 年前
Arthur Juliani	8c6f4696	Fix a couple additional bugs	4 年前
Andrew Cohen	4a3ad193	Add constant decay to beta and epsilon	4 年前
GitHub	c5b94ca6	Use LR schedule for beta and epsilon (#3940 )	4 年前
Arthur Juliani	2b3a6347	Merge remote-tracking branch 'origin/master' into develop-add-fire	4 年前
Christopher Goy	ba80b292	format files with pre-commit.	4 年前
yanchaosun	ac4c80c2	integrate the implementation and hyperparameters	4 年前
GitHub	a28e2767	Update add-fire to latest master, including Policy refactor (#4263 ) * Update Dockerfile * Separate send environment data from reset (#4128) * Fixed a typo on ML-Agents-Overview.md (#4130) Fixed redundant "to" word from the sentence since it is probably a typo in document. * Updated the badge’s link to point to the newest doc version * Replaced all of the doc to release_3_doc * Fix 3DBall and 3DBallHard SAC regressions (#4132) * Move memory validation to settings * Update docs * Add settings test * Update to release_3 in installation.md (#4144) * rename to SideChannelManager +backcompat (#4137) * Remove comment about logo with --help (#4148) * [bugfix] Make FoodCollector heuristic playable (#4147) * Make FoodCollector heuristic playable * Update changelog * script to check for old release links and references (#4153) * Remove package validation suite from Project (#4146) * RayPerceptionSensor: handle empty and invalid tags (#4155...	4 年前
GitHub	3bcb029b	[refactor] Remove BrainParameters from Python code (#4138 )	4 年前
GitHub	129f9ddc	[MLA-427] make pyupgrade convert f-strings too (#4244 ) * make pyupgrade convert f-strings too	4 年前
GitHub	380fef57	[refactor] Move TF-specific files to tf/ folder (#4266 )	4 年前
Andrew Cohen	41216d7a	test initalize steps to 100	4 年前
yanchaosun	36f36750	target critic for ppo	4 年前
Andrew Cohen	18ff42a6	use mean of first trajectory to initialize the normalizer	4 年前
Andrew Cohen	ce9bcefe	cleaned up initialization of variance/mean	4 年前
Andrew Cohen	4b094d25	large normalization obs unit test	4 年前
Ervin Teng	d65a9326	Merge branch 'master' into develop-add-fire-mm3	4 年前
GitHub	bf6506fc	[feature] Add small CNN for grids 5x5 and up (#4434 )	4 年前
GitHub	88d3ec3e	Merge master into hybrid actions staging branch (#4704 )	4 年前

1 2

94 次代码提交 (bf68edcf-4743-4d9a-9d0c-8a70e163ff84)